datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

[EPIC] A list of performance improvement tickets

Open alamb opened this issue 2 years ago • 4 comments

This has a list of performance improvements:

  • [x] https://github.com/apache/arrow-datafusion/issues/5230
  • [x] https://github.com/apache/arrow-datafusion/issues/4973
  • [ ] https://github.com/apache/arrow-datafusion/issues/4904
  • [x] https://github.com/apache/arrow-datafusion/issues/2427
  • [x] https://github.com/apache/arrow-datafusion/issues/5061
  • [x] https://github.com/apache/arrow-datafusion/issues/956
  • [x] https://github.com/apache/arrow-datafusion/issues/850
  • [x] https://github.com/apache/arrow-datafusion/issues/846
  • [ ] https://github.com/apache/arrow-datafusion/issues/258
  • [x] https://github.com/apache/arrow-datafusion/issues/145
  • [x] https://github.com/apache/arrow-datafusion/issues/88
  • [x] https://github.com/apache/arrow-datafusion/issues/5547
  • [ ] https://github.com/apache/arrow-datafusion/issues/5436
  • [x] https://github.com/apache/arrow-datafusion/issues/5942
  • [x] https://github.com/apache/arrow-datafusion/issues/5995
  • [ ] https://github.com/apache/arrow-datafusion/issues/5944
  • [x] https://github.com/apache/arrow-datafusion/issues/6002
  • [ ] https://github.com/apache/arrow-datafusion/issues/5504
  • [x] https://github.com/apache/arrow-datafusion/issues/5646
  • [x] https://github.com/apache/arrow-datafusion/issues/6768
  • [ ] https://github.com/apache/arrow-datafusion/issues/6822
  • [ ] https://github.com/apache/arrow-datafusion/issues/7571
  • [ ] https://github.com/apache/arrow-datafusion/issues/7647
  • [ ] https://github.com/apache/arrow-datafusion/issues/7000
  • [x] https://github.com/apache/arrow-datafusion/issues/7950
  • [x] https://github.com/apache/arrow-datafusion/issues/7949
  • [ ] https://github.com/apache/arrow-datafusion/issues/7955
  • [ ] https://github.com/apache/arrow-datafusion/issues/7957
  • [x] https://github.com/apache/arrow-datafusion/issues/9148

alamb avatar Mar 10 '23 12:03 alamb

I'd be interested in picking up one of these... is #846 currently being worked on? If not you could assign me, @alamb ? Otherwise, they all look pretty interesting to me so feel free to assign me to something else on the list

jaylmiller avatar Mar 10 '23 16:03 jaylmiller

Thanks @jaylmiller !

I'd be interested in picking up one of these... is https://github.com/apache/arrow-datafusion/issues/846 currently being worked on? If not you could assign me, @alamb ? Otherwise, they all look pretty interesting to me so feel free to assign me to something else on the list

I dont think https://github.com/apache/arrow-datafusion/issues/846 is being worked on, but given that the GroupByHash now uses the row format, I am not sure how relevant it is.

Please do feel free to comment on any ticket that is interesting -- no need to have it assigned to work on something!

Thanks for all the help so far on making Sort faster

alamb avatar Mar 10 '23 22:03 alamb

Thanks @jaylmiller !

I'd be interested in picking up one of these... is #846 currently being worked on? If not you could assign me, @alamb ? Otherwise, they all look pretty interesting to me so feel free to assign me to something else on the list

I dont think #846 is being worked on, but given that the GroupByHash now uses the row format, I am not sure how relevant it is.

Please do feel free to comment on any ticket that is interesting -- no need to have it assigned to work on something!

Thanks for all the help so far on making Sort faster

Sounds good! #846 was kindof arbitrary to be honest 😅, I'll read thru them more closely and pick one that seems interesting.

jaylmiller avatar Mar 10 '23 23:03 jaylmiller

Sounds good! https://github.com/apache/arrow-datafusion/issues/846 was kindof arbitrary to be honest 😅, I'll read thru them more closely and pick one that seems interesting.

Awesome -- thanks @jaylmiller

I think in general the "make aggregation faster" https://github.com/apache/arrow-datafusion/issues/4973 and high cardinality groups https://github.com/apache/arrow-datafusion/issues/5547 are the most pressing things from a performance perspective.

However, they are also the ones with the most active thought / work on them, so they probably need some more coordination, which you may or ma not be interested in doing

alamb avatar Mar 11 '23 01:03 alamb

I moved all items not yet completed to https://github.com/apache/datafusion/issues/14482 so we could have a fresher list

alamb avatar Feb 04 '25 12:02 alamb