incubator-gluten
incubator-gluten copied to clipboard
[VL] system hang during spill of hashagg
Backend
VL (Velox)
Bug description
Error message.
W20240621 15:14:01.929342 114227 Operator.cpp:641] Can't reclaim from memory pool op.5.0.0.Aggregation which is under non-reclaimable section, memory usage: 231.99MB, reservation: 232.00MB
W20240621 15:14:01.930936 101314 Operator.cpp:641] Can't reclaim from memory pool op.5.0.0.Aggregation which is under non-reclaimable section, memory usage: 128.00MB, reservation: 128.00MB
W20240621 15:14:01.931005 101314 Operator.cpp:641] Can't reclaim from memory pool op.5.0.0.Aggregation which is under non-reclaimable section, memory usage: 128.00MB, reservation: 128.00MB
W20240621 15:14:01.934880 114227 HashAggregation.cpp:408] Can't reclaim from aggregation operator which has spilled and is under output processing, pool op.5.0.0.Aggregation, memory usage: 236.76MB, reservation: 240.00MB
24/06/21 15:14:01 ERROR [Executor task launch worker for task 2259.0 in stage 2.0 (TID 14859)] nmm.ManagedReservationListener: Error reserving memory from target
java.lang.NullPointerException
at java.util.Objects.requireNonNull(Objects.java:203)
at java.util.Optional.<init>(Optional.java:96)
at java.util.Optional.of(Optional.java:108)
at org.apache.gluten.memory.nmm.NativeMemoryManagers$1.spill(NativeMemoryManagers.java:79)
at org.apache.gluten.memory.memtarget.Spillers$WithMinSpillSize.spill(Spillers.java:57)
at org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:90)
at org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:61)
at org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:80)
at org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:61)
at org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:80)
at org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:61)
at org.apache.gluten.memory.memtarget.spark.TreeMemoryConsumer.spill(TreeMemoryConsumer.java:120)
at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:213)
at org.apache.spark.memory.MemoryConsumer.acquireMemory(MemoryConsumer.java:136)
at org.apache.gluten.memory.memtarget.spark.TreeMemoryConsumer.borrow(TreeMemoryConsumer.java:70)
at org.apache.gluten.memory.memtarget.TreeMemoryTargets$Node.borrow0(TreeMemoryTargets.java:137)
at org.apache.gluten.memory.memtarget.TreeMemoryTargets$Node.borrow(TreeMemoryTargets.java:129)
at org.apache.gluten.memory.memtarget.TreeMemoryTargets$Node.borrow0(TreeMemoryTargets.java:137)
at org.apache.gluten.memory.memtarget.TreeMemoryTargets$Node.borrow(TreeMemoryTargets.java:129)
at org.apache.gluten.memory.memtarget.OverAcquire.borrow(OverAcquire.java:56)
at org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget.borrow(ThrowOnOomMemoryTarget.java:35)
at org.apache.gluten.memory.nmm.ManagedReservationListener.reserve(ManagedReservationListener.java:43)
at org.apache.gluten.memory.nmm.NativeMemoryManager.create(Native Method)
at org.apache.gluten.memory.nmm.NativeMemoryManager.create(NativeMemoryManager.java:49)
at org.apache.gluten.memory.nmm.NativeMemoryManagers.createNativeMemoryManager(NativeMemoryManagers.java:155)
at org.apache.gluten.memory.nmm.NativeMemoryManagers.create(NativeMemoryManagers.java:56)
at org.apache.spark.shuffle.ColumnarShuffleWriter.internalWrite(ColumnarShuffleWriter.scala:159)
at org.apache.spark.shuffle.ColumnarShuffleWriter.write(ColumnarShuffleWriter.scala:242)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1471)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
No response