Reported OOM with high cardinality distrinct aggregates
Describe the bug
We have a user report that they are unable to get Comet to run certain aggregate queries that work fine in Spark.
This issue is to track the effort in creating a repro case so that we can understand the root cause.
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response
The OOM is happening in native code in the Comet shuffle write processor
Hmm, is any stack trace or other hint?
Btw, the shuffle write processor pulls batches from current stage of execution. It doesn't have to be in shuffle code (during shuffling). I.e., if the write processor pulls batches from aggregation and OOM during aggregation.
They also said they turned shuffle off, and got OOM in Java code. I don't think we have enough info.
Closing this issue since it is vague and cannot reproduce. Native shuffle was re-implemented to be more memory efficient since this issue was filed.