datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Reported OOM with high cardinality distrinct aggregates

Open andygrove opened this issue 1 year ago • 4 comments

Describe the bug

We have a user report that they are unable to get Comet to run certain aggregate queries that work fine in Spark.

This issue is to track the effort in creating a repro case so that we can understand the root cause.

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

andygrove avatar Jul 17 '24 15:07 andygrove

The OOM is happening in native code in the Comet shuffle write processor

andygrove avatar Jul 18 '24 19:07 andygrove

Hmm, is any stack trace or other hint?

viirya avatar Jul 18 '24 19:07 viirya

Btw, the shuffle write processor pulls batches from current stage of execution. It doesn't have to be in shuffle code (during shuffling). I.e., if the write processor pulls batches from aggregation and OOM during aggregation.

viirya avatar Jul 18 '24 19:07 viirya

They also said they turned shuffle off, and got OOM in Java code. I don't think we have enough info.

viirya avatar Jul 18 '24 19:07 viirya

Closing this issue since it is vague and cannot reproduce. Native shuffle was re-implemented to be more memory efficient since this issue was filed.

andygrove avatar Jun 16 '25 19:06 andygrove