datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Prevent over-allocations (and spills) on sorts with a fixed limit

Open isidentical opened this issue 3 years ago • 0 comments

Which issue does this PR close?

Part of #3579. More context is available on this comment: https://github.com/apache/arrow-datafusion/issues/3579#issuecomment-1255596028

Rationale for this change

During sorting, when we receive a new record batch we try to allocate space for it. This is done with the assumption that the result of this sort will still be around, and we don't want to accidentally overflow the memory so we have to keep track of it. But after the #3510, this assumption might not hold for all cases (particularly when you have a fetch limit set on your sorting operation) so we might be over-allocating memory and constantly spilling for no good reason.

What changes are included in this PR?

This PR adds the logic for avoiding over-allocations by instructing the memory manager to shrink after each partial sort with a limit.

Are there any user-facing changes?

No, this should be an optimization.

isidentical avatar Sep 22 '22 22:09 isidentical