sleeper
sleeper copied to clipboard
Implement aggregation of map columns.
Background
Following on from #4344 , we would like to support aggregation of map columns in DataFusion. For example, given
| Key | Total | Items |
|---|---|---|
| a | 6 | { 'k1' = 2, 'k2' = 1, 'k3' = 4} |
| a | 3 | { 'k2' = 2, 'k4' = 1} |
| b | 1 | { 'k1' = 1 } |
| b | 2 | { 'k1' = 1, 'k3' = 1 } |
| .... |
Given the query SELECT Key, sum(Total), map_sum(Items) FROM table we would get:
| Key | Total | Items |
|---|---|---|
| a | 9 | { 'k1' = 2, 'k2' = 3, 'k3' = 4, 'k4' = 1} |
| b | 3 | { 'k1' = 2, 'k3 = 1 } |
| .... |
Description
We'd like to be able to call a function similar to "sum" or "count" for aggregating maps of different types.
Analysis
For best performance this should be implemented as a new User Defined Aggregate Function in DataFusion. We should implement the Accumulator trait as well as the GroupAccumulator trait to give the best aggregation performance.