Monoid aggregations
This is not all all done.
Using DiffVector we can aggregate SUM and COUNT in a more efficient, "parallel" way.
The idea is to port all of them.
Currently not working is the correct rearrangement of input variables and output variables.
We can use Differentials Monoids to track different aggregates in the Diff.
We explode() the value vector into a DiffVector and maintain the monoid corresponding to the given aggregation. Using count we resurface the values into the data part.
A few minor caveats:
- Median is not a Monoid operation (There might still be a way, which technically not correct, but at least morally ;) )
- Different Aggregations: Currently every element in the DiffVector is a Sum Monoid, if we want to use Min or Max we need to use an enum wrapping them , e.g.
Diff<- Doesn't look to nice though - Implementing
AVGorVARrequirers some post processing currently not there.
Corresponding PR in Differential Dataflow: https://github.com/TimelyDataflow/differential-dataflow/pull/156
Exciting stuff!
Yay for separate module, nay for killing the old aggregate test ;)
-
How would that look like?
-
But seems worth it? Or am I missing something?
-
I kind of want to get rid of any aggregation that clients could easily derive from a lower-level one, on which Differential can do the heavy lifting. So AVG and VARIANCE would be kicked out. What do you think?