Derived Dataset

Another pretty common operation is to transform the dataset into another dataset according to a method of some kind, such as grouping rows according to some criteria. When a dataset undergoes a transoformation like that we call it a derivative. There are currently only two basic derivatives available in Dataset with the plan to add more as they are needed: groupBy and movingAverage.

GroupBy

A groupBy operation involves splitting the data into groups based on a specific column, applying a function to the rows in each group and combining the results into a single dataset.

For example, when grouping the following dataset by the "state" column:

state value
MA 130000
MA 420
AZ 2900
AZ 4

The result of the call:

ds.groupBy("state", ["value"]);
state value
MA 130420
AZ 2904

By default the groupBy will sum up the values in the rows, but you can pass any method as an options argument like so:

You can edit the code in this block and rerun it.

CountBy

A countBy operation involves counting the number of occurances of each unique value in a specified column. Those values are then set as another column called "count."

For example, when counting the following dataset by the "state" column:

state value
MA 130000
MA 420
AZ 2900
AZ 4

The result of the call:

ds.countBy("state");
state count
MA 2
AZ 2

Moving Average

A moving average of size N is a new sequence that is computed by taking the mean (or any other method) of the subsequences of N terms. For example, taking the moving average with a window size of 3 of the following dataset:

key value
A 130000
B 420
C 1000
D 200
E 2900
F 4

Like so:

ds.movingAverage("value");
Will result in the following table:
key value (explanation - NOT IN TABLE)
C 43806 (130000 + 420 + 1000)/3
D 540 (420 + 1000 + 200)/3
E 1366 (1000 + 200 + 2900)/3
F 1034 (200 + 2900 + 4)/3

Note that you can also specify multiple columns like so:

ds.movingAverage(["A", "B", "C"]);
And an alternate method like so:
ds.movingAverage(["A", "B", "C"], { method : _.sum });

Syncing Behavior

If you are creating a derived dataset from a dataset that is syncable, you can subscribe to derived dataset'schange event.

Because of the inherent nature of a derived dataset, even the smallest change in your original data can cause many changes in your derived dataset. At the moment, those changes are not encompased in a set of deltas. Instead, the derived dataset gets recomputed. This is an expensive operation, but it reduces the code complexity substantially. We are open to discussing a better way of handling this situation, but for now, this works.

« Computed Values