[EPIC] A list of performance improvement tickets
This has a list of performance improvements:
- [x] https://github.com/apache/arrow-datafusion/issues/5230
- [x] https://github.com/apache/arrow-datafusion/issues/4973
- [ ] https://github.com/apache/arrow-datafusion/issues/4904
- [x] https://github.com/apache/arrow-datafusion/issues/2427
- [x] https://github.com/apache/arrow-datafusion/issues/5061
- [x] https://github.com/apache/arrow-datafusion/issues/956
- [x] https://github.com/apache/arrow-datafusion/issues/850
- [x] https://github.com/apache/arrow-datafusion/issues/846
- [ ] https://github.com/apache/arrow-datafusion/issues/258
- [x] https://github.com/apache/arrow-datafusion/issues/145
- [x] https://github.com/apache/arrow-datafusion/issues/88
- [x] https://github.com/apache/arrow-datafusion/issues/5547
- [ ] https://github.com/apache/arrow-datafusion/issues/5436
- [x] https://github.com/apache/arrow-datafusion/issues/5942
- [x] https://github.com/apache/arrow-datafusion/issues/5995
- [ ] https://github.com/apache/arrow-datafusion/issues/5944
- [x] https://github.com/apache/arrow-datafusion/issues/6002
- [ ] https://github.com/apache/arrow-datafusion/issues/5504
- [x] https://github.com/apache/arrow-datafusion/issues/5646
- [x] https://github.com/apache/arrow-datafusion/issues/6768
- [ ] https://github.com/apache/arrow-datafusion/issues/6822
- [ ] https://github.com/apache/arrow-datafusion/issues/7571
- [ ] https://github.com/apache/arrow-datafusion/issues/7647
- [ ] https://github.com/apache/arrow-datafusion/issues/7000
- [x] https://github.com/apache/arrow-datafusion/issues/7950
- [x] https://github.com/apache/arrow-datafusion/issues/7949
- [ ] https://github.com/apache/arrow-datafusion/issues/7955
- [ ] https://github.com/apache/arrow-datafusion/issues/7957
- [x] https://github.com/apache/arrow-datafusion/issues/9148
I'd be interested in picking up one of these... is #846 currently being worked on? If not you could assign me, @alamb ? Otherwise, they all look pretty interesting to me so feel free to assign me to something else on the list
Thanks @jaylmiller !
I'd be interested in picking up one of these... is https://github.com/apache/arrow-datafusion/issues/846 currently being worked on? If not you could assign me, @alamb ? Otherwise, they all look pretty interesting to me so feel free to assign me to something else on the list
I dont think https://github.com/apache/arrow-datafusion/issues/846 is being worked on, but given that the GroupByHash now uses the row format, I am not sure how relevant it is.
Please do feel free to comment on any ticket that is interesting -- no need to have it assigned to work on something!
Thanks for all the help so far on making Sort faster
Thanks @jaylmiller !
I'd be interested in picking up one of these... is #846 currently being worked on? If not you could assign me, @alamb ? Otherwise, they all look pretty interesting to me so feel free to assign me to something else on the list
I dont think #846 is being worked on, but given that the GroupByHash now uses the row format, I am not sure how relevant it is.
Please do feel free to comment on any ticket that is interesting -- no need to have it assigned to work on something!
Thanks for all the help so far on making Sort faster
Sounds good! #846 was kindof arbitrary to be honest 😅, I'll read thru them more closely and pick one that seems interesting.
Sounds good! https://github.com/apache/arrow-datafusion/issues/846 was kindof arbitrary to be honest 😅, I'll read thru them more closely and pick one that seems interesting.
Awesome -- thanks @jaylmiller
I think in general the "make aggregation faster" https://github.com/apache/arrow-datafusion/issues/4973 and high cardinality groups https://github.com/apache/arrow-datafusion/issues/5547 are the most pressing things from a performance perspective.
However, they are also the ones with the most active thought / work on them, so they probably need some more coordination, which you may or ma not be interested in doing
I moved all items not yet completed to https://github.com/apache/datafusion/issues/14482 so we could have a fresher list