datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Grouped aggregations with many distinct groups do not respect memory limit when input is sorted

Open pepijnve opened this issue 4 weeks ago • 2 comments

Describe the bug

In GroupedHashAggregateStream::spill_previous_if_necessary, when the group_ordering is not GroupOrdering::None, spilling is currently not supported.

In GroupedHashAggregateStream::group_aggregate_batch, there is code that ignores out of memory errors under the assumption that spilling will kick in the next time spill_previous_if_necessary is called.

The optimistic check in group_aggregate_batch is out of sync with spill_previous_if_necessary though causing out of memory errors to be ignored even when spilling is not possible.

To Reproduce

Test case added in #19287

Expected behavior

When spilling is not possible the memory pool size is respected.

Additional context

No response

pepijnve avatar Dec 11 '25 15:12 pepijnve

take

carpecodeum avatar Dec 11 '25 15:12 carpecodeum

@carpecodeum sorry, I should have self assigned. I made a PR for this already. Input on that one would be appreciated of course.

pepijnve avatar Dec 11 '25 16:12 pepijnve