Creating a Dataset

To begin working with your dataset, you first need to import your data. You can import either local data (in the form of a variable that happens to contain your data) or remote data (in the form of a url that we'll fetch from.)

Local Importing

Out of the box Dataset can take local data objects or remote urls and import data in almost any common format.

There is also a growing library of custom data importers such as:

Importing from a local object array

If you have an array of json objects, you can easily convert them to a Dataset like so:

Importing from a local "strict" format object

If you happen to have your data preprocessed in what we call a "strict" format, you can speed up your import slightly by initializing your Dataset with the strict flag:

Importing from a local delimited string format

If for some reason you actually have all your data as a delimited string on the client side (which is an unlikely but possible,) you can import that into a dataset object too.

Note You can also import remote delimited data by simply providing a url parameter instead of the data one.
Note You can specify any delimiter string, not just the comma.

Remote Importing

Most of the time, your data will live somewhere else and you'll want to fetch it via a url. All the above formats would work except you would need to replace your data property with with a url property like so:

var ds = new DS.Dataset({ url : "http://myserver.com/data/mydata.json" });

Google Spreadsheet Importing

If you have a published Google Spreadsheet that you would like to import data from, you can do so with the following format:

The google spreadsheet importer is utilizing the format specified here: http://code.google.com/apis/gdata/samples/spreadsheet_sample.html

Remote Polling

If you are handling a live data feed, you can initialize your dataset to perform ajax-based polling at regular intervals to fetch your data. There are three different ways in which this data can be merged into your existing dataset:

Custom Importers

You may have noticed how easy it is to set a custom importer and parser in the dataset constructor by specifying the importer and parser properties. The import system can also easily be extended for custom data formats and other APIs. See the "Write Your Own Importers & Parsers" section for more information.

Fetching Data

Regardless of how you initialized your dataset (locally or remotely), it needs to be fetched for the data to be available. To begin fetching your data, simply call .fetch() on it.

Note that if your data is remote, it is especially imperative that you don't attempt to access your dataset before that call is complete.

Data can be fetched in one of three ways:

Pass success/error callbacks:

Using Deferreds

If you have more than one dataset you need to wait on, or you might be a fan of using deferreds, you can use them as follows:

ready Callback

You can also pass the dataset a ready callback that will be executed once the data is ready to be manipulated. This still requires fetching but allows you to have dataset-specific callbacks vs a single success callback for multiple fetches, for example.

Note that the individual ready callbacks are executed first and then the fetch callback gets executed.

Data Types

Built in Types:

Dataset supports the following prebuilt data types:

Dataset will attempt and detect the type of each column when the data is being fetched. This will be done by looking at the first few rows of the data.

Overriding Detected Types

Dataset will attempt to detect what the type of your data is. However, if any of your columns are of a time format, it's much more efficient for you to specify that directly as follows:

columns : [
  { name : 'columnName', 
    type : '' 
    … [any additional type required options here.] 
  }
]

Dataset will take care of the actual type coercion, making it trivial to convert strings to numbers or timestamps to `moment` objects. For example, coercing the timestamp column into a time column and the total column to a numeric type would look like so:

Custom Types

The type system itself can be extended to add new types for your data. The current type set is defined in src/types.js.

To define a new type, the required signature is as follows:

For example, we might define a custom phone type like so:

Accessing Data

Columns

Each column in the dataset is of a Miso.Column type. We shall reference it as column for simplicity's sake.

A column has the following properties:

While you can access the data inside dataset by directly accessing the data property on a column, it is NOT recommended as this will not handle any event propagation. Use direct access sparingly. For more information on accessing rows, see the Rows section.

Getting all column names:

ds.columnNames();
Note this will never include the _id column as it is internal to the dataset implementation and you shouldn't be messing with it.

Getting a column by name:

ds.column(columnName);
This returns the actual column object. Note that because the order of columns is not guaranteed (or should matter,) the fetching of columns is always done by name.

Iterating over columns:

  ds.eachColumn(function(columnName, column, index) {
    // do what you need here.
  });

Rows

Since dataset stores all the data column-wise, sometimes you may want to access a "row" object more easily than by iterating through columns. Note that the row object is not a direct reference to your actual data row (as in, if you modify it, it won't actually trigger a change in your dataset.) To change your dataset, you need to use the `update` method.

Iterating over rows:

Note that each row has a unique identifier assiged to it called `_id` in a separate column. Do not attempt to change that value unless you're feeling destructive. That identifier is used for caching purposes and changing it may make your data inaccessible through the API.

Row by Position:

if you're trying to get the Nth row, you can do so as follows:

ds.rowByPosition(5);
Note, this will return a row object that will not be a direct reference to your data. This will be a copy.

Row by id:

If you're trying to get a row with a specific id, you can do so as follows:

ds.rowById(423);
Note, this will return a row object that will not be a direct reference to your data. This will be a copy.

Events

Dataset has a very rich event system that allows you to bind to a variety of events on your dataset. By default, this functionality is NOT ENABLED. This is because event bindings are created automatically in certain cases (see more about selection and filtering) and unless that functionality is needed, there's no reason to create the bindings.

To enable evented behavior, set the sync property to true when initializing your dataset.

var ds = new Miso.Dataset({
  data : [
    { one : 12,  two : 40,  three : 40 }
  ],
  sync : true
});

Default Events

Presently, dataset fires the following events:

Event Fired When Precedence
add Fired when adding a row to the dataset by calling .add Primary
remove Fired when removing a row from the dataset by calling .remove Primary
update Fired when updating a row in the dataset by calling .update Primary
change Fired when calling .add, .remove or .update Secondary
sort Fired when a dataset has been sorted. Primary
reset Fired when a dataset has been reset Primary

Any of the default events can be prevented by passing the { silent : true } flag. See the appropriate methods for further instructions.

Binding

To bind to an event, call bind like so:

ds.bind("add", callback);

Event Object

When any of the default events trigger (except for sort) an event object gets created and passed down to the callbacks. The event object is structured as follows:

Deltas

An event is comprised of one or more deltas. Each delta can represent a different operation, allowing a single event to actually represent many modifications.

Each delta can look as follows:

{
  // the set of attributes that changed
  changed : { } or value

  // the old values of those attributes
  old : { } or value
}

Detecting Delta Types:

You can always check what type of a delta you've recieved by calling any of the following helper methods:

Miso.Event.isRemove(delta);
Miso.Event.isAdd(delta);
Miso.Event.isUpdate(delta);

For example:

Custom Events

You can trigger your own custom events on dataset by just calling trigger when needed like so:

ds.trigger("myCustomEvent", arguments...);

Sorting

All data modifications should happen through the following add, remove and update methods.

Adding a row:

To add a row to your dataset, use the add method.

ds.add(rowObject, options);

Adding a row will trigger the following events in this order:

Pass { silent : true } as the optional options parametere to suppress events.

Removing a Row:

To remove a row from your dataset, use the remove method. There are two ways to remove rows:

Either remove call will trigger the following events in this order:

Pass { silent : true } as the optional options parametere to suppress events.

Updating Rows:

To update a row, pass the )id of the row along with the changed attributes like so:

ds.update(rowId, changedAttributes, options);
For example:
ds.update(rowId, {
  col1 : newVal, col2 : newVal ...
});

A call to update will trigger the following events in this order:

Pass { silent : true } as the optional options parametere to suppress events.

Selection

Dataset makes it easy to select sections of your columns and rows based on either static or function based criteria. Creating a selection will return a subset of your data that will function and be queriable in the same way your original dataset is. We call this subset a View. A view is almost identical a dataset except it is immutable(you can continue selecting subsets but you can't modify the data.)

Columns:

You can create a subset containing only some of the original columns from your dataset like so:

ds.columns(["one", "two"]);
Note selecting a single column this way will create a new dataset-like interface. If you're trying to just get a reference to a specific column in your dataset, just call ds.column("one");

Rows:

A more likely selection is a subset of rows that matches a particular criteria. You can select a row subset like so:

ds.rows(filter);

Syncing:

If you enabled syncing on your dataset by setting { sync : true } during instantiation, your views will automatically update to reflect changes in your data. For example:

Computed Values

A pretty common requirement is to actually compute some basic statistics about your data.Most of the time those calculations happen on all the values in a specific column or a collection of columns, which is part of why we arrange our data in a column-wise manner. We call a computation that results in a single value a Miso.Product.

There are several default computations built in:

Max

Note that the max can be computed on numeric columns but also time columns!

Min

Sum

Note you can't add up dates, so don't try that one.

Mean

Syncing

If you have enabled syncing on your dataset by setting { sync : true } during your dataset initialization, you can also subscribe to changes in your computed product.

A big change to note here is that if you plan to subscribe to a specific computation, it is no longer a simple result (like a number or a date.) It now becomse a Miso.Product object that you can retrieve the value from like so:

ds.max(["one", "two"]).val();

For example:

Adding your own

If you want to add your own computations to your dataset, take a look at src/products.js for some examples (like max and min.)

For example, if we wanted to implement a product that returned a random value from the dataset, you could do it like so:

Note that this form will also support an actual subscribable product if your dataset is syncable.

Derived Datasets

Another pretty common operation is to transform the dataset into another dataset according to a method of some kind, such as grouping rows according to some criteria. When a dataset undergoes a transoformation like that we call it a derivative. There are currently only two basic derivatives available in Dataset with the plan to add more as they are needed: groupBy and movingAverage.

GroupBy

A groupBy operation involves splitting the data into groups based on a specific column, applying a function to the rows in each group and combining the results into a single dataset.

For example, when grouping the following dataset by the "state" column:

state value
MA 130000
MA 420
AZ 2900
AZ 4

The result of the call:

ds.groupBy("state", ["value"]);
state value
MA 130420
AZ 2904

By default the groupBy will sum up the values in the rows, but you can pass any method as an options argument like so:

Moving Average

A moving average of size N is a new sequence that is computed by taking the mean (or any other method) of the subsequences of N terms. For example, taking the moving average with a window size of 3 of the following dataset:

key value
A 130000
B 420
C 1000
D 200
E 2900
F 4

Like so:

ds.movingAverage("value");
Will result in the following table:
key value (explanation - NOT IN TABLE)
C 43806 (130000 + 420 + 1000)/3
D 540 (420 + 1000 + 200)/3
E 1366 (1000 + 200 + 2900)/3
F 1034 (200 + 2900 + 4)/3

Note that you can also specify multiple columns like so:

ds.movingAverage(["A", "B", "C"]);
And an alternate method like so:
ds.movingAverage(["A", "B", "C"], { method : _.sum });

Syncing Behavior

If you are creating a derived dataset from a dataset that is syncable, you can subscribe to derived dataset'schange event.

Because of the inherent nature of a derived dataset, even the smallest change in your original data can cause many changes in your derived dataset. At the moment, those changes are not encompased in a set of deltas. Instead, the derived dataset gets recomputed. This is an expensive operation, but it reduces the code complexity substantially. We are open to discussing a better way of handling this situation, but for now, this works.

<<<<<<< HEAD <<<<<<< HEAD ======= >>>>>>> merge? ======= >>>>>>> e65aacefcc5b25209c453f40adf023311a5c83dd