incubator-gluten
incubator-gluten copied to clipboard
ShuffleExternalSorter occupy too many memory and cause OOM
Backend
VL (Velox)
Bug description
Current config settings:
spark.gluten.memory.offHeap.size.in.bytes=15.0 GiB
spark.gluten.memory.task.offHeap.size.in.bytes=3.8 GiB
spark.gluten.memory.conservative.task.offHeap.size.in.bytes=1920.0 MiB
spark.memory.offHeap.enabled=true
spark.gluten.memory.dynamic.offHeap.sizing.enabled=false
Memory consumer stats:
Task.30233: Current used bytes: 3.7 GiB, peak bytes: N/A
+- org.apache.spark.shuffle.sort.ShuffleExternalSorter@73e80a06: Current used bytes: 3.7 GiB, peak bytes: N/A
\- Gluten.Tree.202: Current used bytes: 25.0 MiB, peak bytes: 112.0 MiB
\- root.202: Current used bytes: 25.0 MiB, peak bytes: 112.0 MiB
+- ColumnarToRow.201: Current used bytes: 16.0 MiB, peak bytes: 16.0 MiB
\- single: Current used bytes: 16.0 MiB, peak bytes: 16.0 MiB
+- root: Current used bytes: 16.0 MiB, peak bytes: 16.0 MiB
| \- default_leaf: Current used bytes: 16.0 MiB, peak bytes: 16.0 MiB
\- gluten::MemoryAllocator: Current used bytes: 0.0 B, peak bytes: 0.0 B
+- WholeStageIterator.201: Current used bytes: 9.0 MiB, peak bytes: 66.0 MiB
\- single: Current used bytes: 9.0 MiB, peak bytes: 64.0 MiB
Why use so many memory for ShuffleExternalSorter?
Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
No response
Does the shuffle fallback?
Does the shuffle fallback?
yes @FelixYBW
== Fallback Summary ==
(8) Exchange: Gluten does not touch it or does not support it
== Physical Plan ==
Exchange (8)
+- VeloxColumnarToRow (7)
+- ^ ProjectExecTransformer (5)
+- ^ GenerateExecTransformer (4)
+- ^ ProjectExecTransformer (3)
+- ^ FilterExecTransformer (2)
+- ^ IcebergIcebergScanTransformer (1)
I see, fallbacked exchange operator does use much onheap memory. The only thing you can do is to increase the onheap memory config.