datafusion
datafusion copied to clipboard
[EPIC] Improvements to GroupColumn multi-column aggregation performance
Is your feature request related to a problem or challenge?
In https://github.com/apache/datafusion/pull/12269 @jayzhan211 made significant improvements to how group values are stored in multi-column aggregations. There are a few follow ups to this work that I wanted to track here
Describe the solution you'd like
- [x] https://github.com/apache/datafusion/pull/12620
- [x] https://github.com/apache/datafusion/pull/12619
- [x] https://github.com/apache/datafusion/pull/12617
- [x] https://github.com/apache/datafusion/pull/12623
- [x] https://github.com/apache/datafusion/pull/12703
- [x] https://github.com/apache/datafusion/pull/12681
- [x] https://github.com/apache/datafusion/pull/12758
- [x] https://github.com/apache/datafusion/pull/12770
- [ ] potentially and optimize the take_n for the boolean builder using arrow-rs (would help streaming aggregates)
- [x] https://github.com/apache/datafusion/issues/12771
- [ ] Implement GroupColumn for other primitive based types (e.g. DecimalArray)
Describe alternatives you've considered
No response
Additional context
No response