sleeper icon indicating copy to clipboard operation
sleeper copied to clipboard

Implement aggregation of map columns.

Open m09526 opened this issue 10 months ago • 0 comments

Background

Following on from #4344 , we would like to support aggregation of map columns in DataFusion. For example, given

Key Total Items
a 6 { 'k1' = 2, 'k2' = 1, 'k3' = 4}
a 3 { 'k2' = 2, 'k4' = 1}
b 1 { 'k1' = 1 }
b 2 { 'k1' = 1, 'k3' = 1 }
....

Given the query SELECT Key, sum(Total), map_sum(Items) FROM table we would get:

Key Total Items
a 9 { 'k1' = 2, 'k2' = 3, 'k3' = 4, 'k4' = 1}
b 3 { 'k1' = 2, 'k3 = 1 }
....

Description

We'd like to be able to call a function similar to "sum" or "count" for aggregating maps of different types.

Analysis

For best performance this should be implemented as a new User Defined Aggregate Function in DataFusion. We should implement the Accumulator trait as well as the GroupAccumulator trait to give the best aggregation performance.

m09526 avatar Mar 07 '25 15:03 m09526