trino icon indicating copy to clipboard operation
trino copied to clipboard

Calculate roll-up cumulatively

Open sopel39 opened this issue 2 years ago • 1 comments

Currently, rollup on x, y, z will translate into plan:

FinalAggregregation[$id, x, y, z](aggr)
  RemoteExchange
    PartialAggregation[$id, x, y, z](aggr)
      GroupId

GroupId operator will multiply unaggregated input data 4 times (for groups [], [x], [x,y], [x,y,z]).

PartialAggregation will consume unaggregated input for each grouping set. However, aggregations could be calculated in a cascade way, e.g:

PartialAggregation[$id=0][source: $id=1](aggr)
  PartialAggregation[$id=1, x][source: $id=2](aggr)
    PartialAggregation[$id=2, x, y][source: $id=3](aggr)
      PartialAggregation[$id=3, x, y, z](aggr)
        GroupId

Downstream PartialAggregations would use already partially aggregated results from wider grouping set, while passing through input rows.

This would reduce CPU, but also improve PartialAggregation efficiency as each aggregation is computed with separate operator

sopel39 avatar Sep 20 '22 12:09 sopel39

Initial branch https://github.com/starburstdata/trino/tree/ks/rollup

sopel39 avatar Sep 20 '22 12:09 sopel39