Creating a Dataset
To begin working with your dataset, you first need to import your data. You can import either local data (in the form of a variable that happens to contain your data) or remote data (in the form of a url that we'll fetch from.)
Local Importing
Out of the box Dataset can take local data objects or remote urls and import data in almost any common format.
- JSON (including jsonp)
- CSV
- TSV (Any delimiter is acceptable, including tabs)
There is also a growing library of custom data importers such as:
- Google Spreadsheets
Importing from a local object array
If you have an array of json objects, you can easily convert them to a Dataset like so:
Importing from a local "strict" format object
If you happen to have your data preprocessed in what we call a "strict" format, you can speed up your import slightly by initializing your Dataset with the strict flag:
Importing from a local delimited string format
If for some reason you actually have all your data as a delimited string on the client side (which is an unlikely but possible,) you can import that into a dataset object too.
Note You can also import remote delimited data by simply providing a url parameter instead of the data one.
Note You can specify any delimiter string, not just the comma.
Remote Importing
Most of the time, your data will live somewhere else and you'll want to fetch it via a url. All the above formats would work except you would need to replace your data property with with a url property like so:
var ds = new DS.Dataset({
url : "http://myserver.com/data/mydata.json"
});
Google Spreadsheet Importing
If you have a published Google Spreadsheet that you would like to import data from, you can do so with the following format:
The google spreadsheet importer is utilizing the format specified here: http://code.google.com/apis/gdata/samples/spreadsheet_sample.html
Remote Polling
If you are handling a live data feed, you can initialize your dataset to perform ajax-based polling at regular intervals to fetch your data. There are three different ways in which this data can be merged into your existing dataset:
- Appended - All new rows will be appended to the end of the dataset. This is the default behavior.
-
Reset - All the current rows in the dataset will be thrown out and the new rows will be put into the dataset. To enable this, set
resetOnFetchtotruewhen initializing your dataset. This will fire aresetevent on a syncable dataset. -
Unique - By specifying a column on which the data is supposed to be unique, new incoming rows will only be added IF the value in that column is unique. To enable this, set
uniqueAgainstto the column name you wish to check against. Note, this is an expensive operation!
Custom Importers
You may have noticed how easy it is to set a custom importer and parser in the dataset constructor by specifying the importer and parser properties. The import system can also easily be extended for custom data formats and other APIs. See the "Write Your Own Importers & Parsers" section for more information.
Fetching Data
Regardless of how you initialized your dataset (locally or remotely), it needs to be fetched for the data to be available. To begin fetching your data, simply call .fetch() on it.
Note that if your data is remote, it is especially imperative that you don't attempt to access your dataset before that call is complete.
Data can be fetched in one of three ways:
Pass success/error callbacks:
Using Deferreds
If you have more than one dataset you need to wait on, or you might be a fan of using deferreds, you can use them as follows:
ready Callback
You can also pass the dataset a ready callback that will be executed once the data is ready to be manipulated. This still requires fetching but allows you to have dataset-specific callbacks vs a single success callback for multiple fetches, for example.
Note that the individual ready callbacks are executed first and then the fetch callback gets executed.
Data Types
Built in Types:
Dataset supports the following prebuilt data types:
numberstringbooleantime
Overriding Detected Types
Dataset will attempt to detect what the type of your data is. However, if any of your columns are of a time format, it's much more efficient for you to specify that directly as follows:
columns : [
{ name : 'columnName',
type : ''
… [any additional type required options here.]
}
]
Dataset will take care of the actual type coercion, making it trivial to convert strings to numbers or timestamps to `moment` objects. For example, coercing the timestamp column into a time column and the total column to a numeric type would look like so:
Custom Types
The type system itself can be extended to add new types for your data.
The current type set is defined in src/types.js.
To define a new type, the required signature is as follows:
For example, we might define a custom phone type like so:
Accessing Data
Columns
Each column in the dataset is of aMiso.Column type. We shall reference it as column for simplicity's sake.
A column has the following properties:
nametypedata- the data array for this column._id- a unique id assigned to this column at parse time.
While you can access the data inside dataset by directly accessing the data property on a column, it is NOT recommended as this will
not handle any event propagation. Use direct access sparingly. For more information on accessing rows, see the Rows section.
Getting all column names:
ds.columnNames();Note this will never include the
_id column as it is internal to the dataset implementation and you shouldn't be messing with it.
Getting a column by name:
ds.column(columnName);This returns the actual column object. Note that because the order of columns is not guaranteed (or should matter,) the fetching of columns is always done by name.
Iterating over columns:
ds.eachColumn(function(columnName, column, index) {
// do what you need here.
});
Rows
Since dataset stores all the data column-wise, sometimes you may want to access a "row" object more easily than by iterating through columns. Note that the row object is not a direct reference to your actual data row (as in, if you modify it, it won't actually trigger a change in your dataset.) To change your dataset, you need to use the `update` method.
Iterating over rows:
Note that each row has a unique identifier assiged to it called `_id` in a separate column. Do not attempt to change that value unless you're feeling destructive. That identifier is used for caching purposes and changing it may make your data inaccessible through the API.
Row by Position:
if you're trying to get the Nth row, you can do so as follows:
ds.rowByPosition(5);Note, this will return a row object that will not be a direct reference to your data. This will be a copy.
Row by id:
If you're trying to get a row with a specific id, you can do so as follows:
ds.rowById(423);Note, this will return a row object that will not be a direct reference to your data. This will be a copy.
Events
Dataset has a very rich event system that allows you to bind to a variety of events on your dataset. By default, this functionality is NOT ENABLED. This is because event bindings are created automatically in certain cases (see more about selection and filtering) and unless that functionality is needed, there's no reason to create the bindings.
To enable evented behavior, set the sync property to true when initializing your dataset.
var ds = new Miso.Dataset({
data : [
{ one : 12, two : 40, three : 40 }
],
sync : true
});
Default Events
Presently, dataset fires the following events:
| Event | Fired When | Precedence |
|---|---|---|
add
|
Fired when adding a row to the dataset by calling .add
|
Primary |
remove
|
Fired when removing a row from the dataset by calling .remove
|
Primary |
update
|
Fired when updating a row in the dataset by calling .update
|
Primary |
change
|
Fired when calling .add, .remove or .update
|
Secondary |
sort
|
Fired when a dataset has been sorted. | Primary |
reset
|
Fired when a dataset has been reset | Primary |
Any of the default events can be prevented by passing the { silent : true } flag. See the appropriate methods for further instructions.
Binding
To bind to an event, call bind like so:
ds.bind("add", callback);
Event Object
When any of the default events trigger (except for sort) an event object gets created and passed down to the callbacks. The event object is structured as follows:
- An event is of type
Miso.Event - It has a
deltasproperty that contains all the deltas that were generated for this specific event.
Deltas
An event is comprised of one or more deltas. Each delta can represent a different operation, allowing a single event to actually represent many modifications.
Each delta can look as follows:
{
// the set of attributes that changed
changed : { } or value
// the old values of those attributes
old : { } or value
}
-
When a row is added, there will only be
changedattributes. -
When a row is removed, there will only be
oldattributes. -
When a row is updated, there will be
changedandoldattributes. -
In certain cases the values of
changedandoldmay not be an object but rather a numeric value. More on that in the Computed Values section.
Detecting Delta Types:
You can always check what type of a delta you've recieved by calling any of the following helper methods:
Miso.Event.isRemove(delta);
Miso.Event.isAdd(delta);
Miso.Event.isUpdate(delta);
For example:
Custom Events
You can trigger your own custom events on dataset by just calling trigger when needed like so:
ds.trigger("myCustomEvent", arguments...);
Sorting
All data modifications should happen through the following add, remove and update methods.
Adding a row:
To add a row to your dataset, use the add method.
ds.add(rowObject, options);
Adding a row will trigger the following events in this order:
addchange
{ silent : true } as the optional options parametere to suppress events.
Removing a Row:
To remove a row from your dataset, use the remove method. There are two ways to remove rows:
-
By providing a row id:
ds.remove(rowId, options);
Note the row id is the unique_ididentifier each row has. -
By providing a filter function:
ds.remove(rowFilterFunction, options);
For example:// remove all rows where the population is > 1000 ds.remove(function(row) { return (row.population > 1000); });All rows that pass the filter function will be removed.
Either remove call will trigger the following events in this order:
-
remove -
change
{ silent : true } as the optional options parametere to suppress events.
Updating Rows:
To update a row, pass the )id of the row along with the changed attributes like so:
ds.update(rowId, changedAttributes, options);For example:
ds.update(rowId, {
col1 : newVal, col2 : newVal ...
});
A call to update will trigger the following events in this order:
-
update -
change
{ silent : true } as the optional options parametere to suppress events.
Selection
Dataset makes it easy to select sections of your columns and rows based on either static or function based criteria. Creating a selection will return a subset of your data that will function and be queriable in the same way your original dataset is. We call this subset a View. A view is almost identical a dataset except it is immutable(you can continue selecting subsets but you can't modify the data.)
Columns:
You can create a subset containing only some of the original columns from your dataset like so:
ds.columns(["one", "two"]);Note selecting a single column this way will create a new dataset-like interface. If you're trying to just get a reference to a specific column in your dataset, just call
ds.column("one");
Rows:
A more likely selection is a subset of rows that matches a particular criteria. You can select a row subset like so:
ds.rows(filter);
Syncing:
If you enabled syncing on your dataset by setting { sync : true } during instantiation, your views will automatically update to reflect changes in your data. For example:
Computed Values
A pretty common requirement is to actually compute some basic statistics about your data.Most of the time those calculations happen on all the values in a specific column or a collection of columns, which is part of why we arrange our data in a column-wise manner. We call a computation that results in a single value a Miso.Product.
There are several default computations built in:
Max
Note that the max can be computed on numeric columns but also time columns!
Min
Sum
Note you can't add up dates, so don't try that one.
Mean
Syncing
If you have enabled syncing on your dataset by setting { sync : true } during your dataset initialization, you can also subscribe to changes in your computed product.
A big change to note here is that if you plan to subscribe to a specific computation, it is no longer a simple result (like a number or a date.) It now becomse a Miso.Product object that you can retrieve the value from like so:
ds.max(["one", "two"]).val();
For example:
Adding your own
If you want to add your own computations to your dataset, take a look at src/products.js for some examples (like max and min.)
For example, if we wanted to implement a product that returned a random value from the dataset, you could do it like so:
Note that this form will also support an actual subscribable product if your dataset is syncable.Derived Datasets
Another pretty common operation is to transform the dataset into another dataset according to a method of some kind, such as grouping rows according to some criteria. When a dataset undergoes a transoformation like that we call it a derivative. There are currently only two basic derivatives available in Dataset with the plan to add more as they are needed: groupBy and movingAverage.
GroupBy
A groupBy operation involves splitting the data into groups based on a specific column, applying a function to the rows in each group and combining the results into a single dataset.
For example, when grouping the following dataset by the "state" column:
| state | value |
|---|---|
| MA | 130000 |
| MA | 420 |
| AZ | 2900 |
| AZ | 4 |
The result of the call:
ds.groupBy("state", ["value"]);
| state | value |
|---|---|
| MA | 130420 |
| AZ | 2904 |
By default the groupBy will sum up the values in the rows, but you can pass any method as an options argument like so:
Moving Average
A moving average of size N is a new sequence that is computed by taking the mean (or any other method) of the subsequences of N terms. For example, taking the moving average with a window size of 3 of the following dataset:
| key | value |
|---|---|
| A | 130000 |
| B | 420 |
| C | 1000 |
| D | 200 |
| E | 2900 |
| F | 4 |
Like so:
ds.movingAverage("value");
Will result in the following table:
| key | value | (explanation - NOT IN TABLE) |
|---|---|---|
| C | 43806 | (130000 + 420 + 1000)/3 |
| D | 540 | (420 + 1000 + 200)/3 |
| E | 1366 | (1000 + 200 + 2900)/3 |
| F | 1034 | (200 + 2900 + 4)/3 |
Note that you can also specify multiple columns like so:
ds.movingAverage(["A", "B", "C"]);And an alternate method like so:
ds.movingAverage(["A", "B", "C"], { method : _.sum });
Syncing Behavior
If you are creating a derived dataset from a dataset that is syncable, you can
subscribe to derived dataset'schange event.
Because of the inherent nature of a derived dataset, even the smallest change in your original data can cause many changes in your derived dataset. At the moment, those changes are not encompased in a set of deltas. Instead, the derived dataset gets recomputed. This is an expensive operation, but it reduces the code complexity substantially. We are open to discussing a better way of handling this situation, but for now, this works.