datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

[EPIC] Improvements to GroupColumn multi-column aggregation performance

Open alamb opened this issue 1 year ago • 0 comments

Is your feature request related to a problem or challenge?

In https://github.com/apache/datafusion/pull/12269 @jayzhan211 made significant improvements to how group values are stored in multi-column aggregations. There are a few follow ups to this work that I wanted to track here

Describe the solution you'd like

  • [x] https://github.com/apache/datafusion/pull/12620
  • [x] https://github.com/apache/datafusion/pull/12619
  • [x] https://github.com/apache/datafusion/pull/12617
  • [x] https://github.com/apache/datafusion/pull/12623
  • [x] https://github.com/apache/datafusion/pull/12703
  • [x] https://github.com/apache/datafusion/pull/12681
  • [x] https://github.com/apache/datafusion/pull/12758
  • [x] https://github.com/apache/datafusion/pull/12770
  • [ ] potentially and optimize the take_n for the boolean builder using arrow-rs (would help streaming aggregates)
  • [x] https://github.com/apache/datafusion/issues/12771
  • [ ] Implement GroupColumn for other primitive based types (e.g. DecimalArray)

Describe alternatives you've considered

No response

Additional context

No response

alamb avatar Sep 30 '24 09:09 alamb