Implement spilling for PartialSortExec
Is your feature request related to a problem or challenge?
PartialSortExec was added in https://github.com/apache/arrow-datafusion/issues/7456 / https://github.com/apache/arrow-datafusion/pull/9125
While one of the major benefits of this operator is to reduce memory required when sorting data (as it can emit early) we should also handle the case when it still can not fit everything in
Describe the solution you'd like
Add spilling support to PartialSortExec so that if it runs out of memory it will spill to disk rather than error
Describe alternatives you've considered
No response
Additional context
https://github.com/apache/arrow-datafusion/issues/9153 tracks enabling PartialSort for more queries
I want to help it.
Though it seems not a small project, I think there's spilling implementation in SortExec and I can learn from that.
Thanks @yyy1000 -- I would definitely recommend
- Studying the existing implementation in Sort
- Creating a test case that shows the sort being invoked (aka set memory manager low and create a partial sort plan)
- Try and refactor / adapt the parts used in sort to also be used in partial sort