incubator-gluten icon indicating copy to clipboard operation
incubator-gluten copied to clipboard

ShuffleExternalSorter occupy too many memory and cause OOM

Open j7nhai opened this issue 1 year ago • 1 comments

Backend

VL (Velox)

Bug description

Current config settings:
	spark.gluten.memory.offHeap.size.in.bytes=15.0 GiB
	spark.gluten.memory.task.offHeap.size.in.bytes=3.8 GiB
	spark.gluten.memory.conservative.task.offHeap.size.in.bytes=1920.0 MiB
	spark.memory.offHeap.enabled=true
	spark.gluten.memory.dynamic.offHeap.sizing.enabled=false
Memory consumer stats:
	Task.30233:                                                      Current used bytes:   3.7 GiB, peak bytes:        N/A
	+- org.apache.spark.shuffle.sort.ShuffleExternalSorter@73e80a06: Current used bytes:   3.7 GiB, peak bytes:        N/A
	\- Gluten.Tree.202:                                              Current used bytes:  25.0 MiB, peak bytes:  112.0 MiB
	   \- root.202:                                                  Current used bytes:  25.0 MiB, peak bytes:  112.0 MiB
	      +- ColumnarToRow.201:                                      Current used bytes:  16.0 MiB, peak bytes:   16.0 MiB
  \- single:                                              Current used bytes:  16.0 MiB, peak bytes:   16.0 MiB
     +- root:                                             Current used bytes:  16.0 MiB, peak bytes:   16.0 MiB
     |  \- default_leaf:                                  Current used bytes:  16.0 MiB, peak bytes:   16.0 MiB
     \- gluten::MemoryAllocator:                          Current used bytes:     0.0 B, peak bytes:      0.0 B
	      +- WholeStageIterator.201:                                 Current used bytes:   9.0 MiB, peak bytes:   66.0 MiB
  \- single:                                              Current used bytes:   9.0 MiB, peak bytes:   64.0 MiB

Why use so many memory for ShuffleExternalSorter?

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

j7nhai avatar Oct 23 '24 09:10 j7nhai

Does the shuffle fallback?

FelixYBW avatar Oct 25 '24 07:10 FelixYBW

Does the shuffle fallback?

yes @FelixYBW

== Fallback Summary ==
(8) Exchange: Gluten does not touch it or does not support it

== Physical Plan ==
Exchange (8)
+- VeloxColumnarToRow (7)
   +- ^ ProjectExecTransformer (5)
      +- ^ GenerateExecTransformer (4)
         +- ^ ProjectExecTransformer (3)
            +- ^ FilterExecTransformer (2)
               +- ^ IcebergIcebergScanTransformer (1)

j7nhai avatar Oct 28 '24 04:10 j7nhai

I see, fallbacked exchange operator does use much onheap memory. The only thing you can do is to increase the onheap memory config.

FelixYBW avatar Oct 28 '24 21:10 FelixYBW