incubator-gluten icon indicating copy to clipboard operation
incubator-gluten copied to clipboard

[VL] One task writes too many hive partitions causing OOM

Open wForget opened this issue 1 year ago • 1 comments

Backend

VL (Velox)

Bug description

It seems that writing too many hive partitions causes Not enough spark off-heap execution memory

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

24/10/23 10:05:16 ERROR Utils: Aborting task
org.apache.gluten.exception.GlutenException: org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Operator::addInput failed for [operator: TableWrite, plan node ID: 2]: Error during calling Java code from native code: org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException: Not enough spark off-heap execution memory. Acquired: 8.0 MiB, granted: 7.0 MiB. Try tweaking config option spark.memory.offHeap.size to get larger space to run this application (if spark.gluten.memory.dynamic.offHeap.sizing.enabled is not enabled). 
Current config settings: 
	spark.gluten.memory.offHeap.size.in.bytes=2.0 GiB
	spark.gluten.memory.task.offHeap.size.in.bytes=2.0 GiB
	spark.gluten.memory.conservative.task.offHeap.size.in.bytes=1024.0 MiB
	spark.memory.offHeap.enabled=true
	spark.gluten.memory.dynamic.offHeap.sizing.enabled=false
Memory consumer stats: 
	Task.6165:                                                                  Current used bytes: 2041.0 MiB, peak bytes:        N/A
	\- Gluten.Tree.7:                                                           Current used bytes: 2041.0 MiB, peak bytes:    2.0 GiB
	   \- root.7:                                                               Current used bytes: 2041.0 MiB, peak bytes:    2.0 GiB
	      +- WholeStageIterator.7:                                              Current used bytes: 2016.0 MiB, peak bytes: 2023.0 MiB
	      |  \- single:                                                         Current used bytes: 2016.0 MiB, peak bytes: 2016.0 MiB
	      |     +- root:                                                        Current used bytes: 1867.6 MiB, peak bytes: 2016.0 MiB
	      |     |  +- task.Gluten_Stage_135_TID_6165_VTID_7:                    Current used bytes: 1867.6 MiB, peak bytes: 2016.0 MiB
	      |     |  |  +- node.2:                                                Current used bytes: 1867.6 MiB, peak bytes: 2016.0 MiB
	      |     |  |  |  +- op.2.0.0.TableWrite.test-hive:                      Current used bytes: 1862.5 MiB, peak bytes: 2010.0 MiB
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[34]:          Current used bytes:   27.1 MiB, peak bytes:   28.0 MiB
	      |     |  |  |  |  |  +- writer_node_16495590765074372487:             Current used bytes:   27.1 MiB, peak bytes:   28.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   27.1 MiB, peak bytes:   27.2 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[34].sink:  Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[55]:          Current used bytes:   24.2 MiB, peak bytes:   28.0 MiB
	      |     |  |  |  |  |  +- writer_node_3034760001384861262:              Current used bytes:   24.2 MiB, peak bytes:   28.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   24.2 MiB, peak bytes:   24.3 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[55].sink:  Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[126]:         Current used bytes:   22.9 MiB, peak bytes:   24.0 MiB
	      |     |  |  |  |  |  +- writer_node_10989164195300702660:             Current used bytes:   22.9 MiB, peak bytes:   24.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   22.9 MiB, peak bytes:   22.9 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[126].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[190]:         Current used bytes:   22.9 MiB, peak bytes:   24.0 MiB
	      |     |  |  |  |  |  +- writer_node_2462400773501287974:              Current used bytes:   22.9 MiB, peak bytes:   24.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   22.9 MiB, peak bytes:   22.9 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[190].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[15]:          Current used bytes:   22.6 MiB, peak bytes:   24.0 MiB
	      |     |  |  |  |  |  +- writer_node_10962424750642795316:             Current used bytes:   22.6 MiB, peak bytes:   24.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   22.6 MiB, peak bytes:   22.7 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[15].sink:  Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[110]:         Current used bytes:   22.5 MiB, peak bytes:   24.0 MiB
	      |     |  |  |  |  |  +- writer_node_17871452156508439970:             Current used bytes:   22.5 MiB, peak bytes:   24.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   22.5 MiB, peak bytes:   22.5 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[110].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[63]:          Current used bytes:   21.9 MiB, peak bytes:   24.0 MiB
	      |     |  |  |  |  |  +- writer_node_11605731064327424053:             Current used bytes:   21.9 MiB, peak bytes:   24.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   21.9 MiB, peak bytes:   22.0 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[63].sink:  Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[174]:         Current used bytes:   20.2 MiB, peak bytes:   24.0 MiB
	      |     |  |  |  |  |  +- writer_node_1465030665100865752:              Current used bytes:   20.2 MiB, peak bytes:   24.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   20.2 MiB, peak bytes:   20.2 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[174].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[172]:         Current used bytes:   18.9 MiB, peak bytes:   20.0 MiB
	      |     |  |  |  |  |  +- writer_node_7515311972563278385:              Current used bytes:   18.9 MiB, peak bytes:   20.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   18.9 MiB, peak bytes:   18.9 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[172].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[113]:         Current used bytes:   18.5 MiB, peak bytes:   20.0 MiB
	      |     |  |  |  |  |  +- writer_node_13120283611592924997:             Current used bytes:   18.5 MiB, peak bytes:   20.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   18.5 MiB, peak bytes:   18.6 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[113].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[42]:          Current used bytes:   18.5 MiB, peak bytes:   20.0 MiB
	      |     |  |  |  |  |  +- writer_node_9372597259039499552:              Current used bytes:   18.5 MiB, peak bytes:   20.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   18.5 MiB, peak bytes:   18.5 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[42].sink:  Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[131]:         Current used bytes:   18.2 MiB, peak bytes:   20.0 MiB
	      |     |  |  |  |  |  +- writer_node_4488967983352450288:              Current used bytes:   18.2 MiB, peak bytes:   20.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   18.2 MiB, peak bytes:   18.2 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[131].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[162]:         Current used bytes:   15.4 MiB, peak bytes:   16.0 MiB
	      |     |  |  |  |  |  +- writer_node_17438843217648573119:             Current used bytes:   15.4 MiB, peak bytes:   16.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   15.4 MiB, peak bytes:   15.5 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[162].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[65]:          Current used bytes:   15.1 MiB, peak bytes:   16.0 MiB
	      |     |  |  |  |  |  +- writer_node_15452059679315520385:             Current used bytes:   15.1 MiB, peak bytes:   16.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   15.1 MiB, peak bytes:   15.1 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[65].sink:  Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[210]:         Current used bytes:   15.0 MiB, peak bytes:   16.0 MiB
	      |     |  |  |  |  |  +- writer_node_9266752813434529133:              Current used bytes:   15.0 MiB, peak bytes:   16.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   15.0 MiB, peak bytes:   15.1 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[210].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[67]:          Current used bytes:   14.4 MiB, peak bytes:   15.0 MiB
	      |     |  |  |  |  |  +- writer_node_5157661972043366947:              Current used bytes:   14.4 MiB, peak bytes:   15.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   14.4 MiB, peak bytes:   14.4 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[67].sink:  Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[72]:          Current used bytes:   14.0 MiB, peak bytes:   15.0 MiB
	      |     |  |  |  |  |  +- writer_node_14514910955711142343:             Current used bytes:   14.0 MiB, peak bytes:   15.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   14.0 MiB, peak bytes:   14.0 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[72].sink:  Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[195]:         Current used bytes:   13.4 MiB, peak bytes:   14.0 MiB
	      |     |  |  |  |  |  +- writer_node_1485441638280894625:              Current used bytes:   13.4 MiB, peak bytes:   14.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   13.4 MiB, peak bytes:   13.5 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[195].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[123]:         Current used bytes:   13.4 MiB, peak bytes:   14.0 MiB
	      |     |  |  |  |  |  +- writer_node_6634720992138079112:              Current used bytes:   13.4 MiB, peak bytes:   14.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   13.4 MiB, peak bytes:   13.4 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[123].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[185]:         Current used bytes:   13.3 MiB, peak bytes:   14.0 MiB
	      |     |  |  |  |  |  +- writer_node_13535610051813534755:             Current used bytes:   13.3 MiB, peak bytes:   14.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   13.3 MiB, peak bytes:   13.4 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[185].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[200]:         Current used bytes:   13.3 MiB, peak bytes:   14.0 MiB
	      |     |  |  |  |  |  +- writer_node_3875021975498280443:              Current used bytes:   13.3 MiB, peak bytes:   14.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   13.3 MiB, peak bytes:   13.3 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[200].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[59]:          Current used bytes:   12.9 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  +- writer_node_16272544847912354319:             Current used bytes:   12.9 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   12.9 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[59].sink:  Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[166]:         Current used bytes:   12.6 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  +- writer_node_15485796053569832754:             Current used bytes:   12.6 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   12.6 MiB, peak bytes:   12.7 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[166].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[157]:         Current used bytes:   12.6 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  +- writer_node_8295349848501177671:              Current used bytes:   12.6 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   12.6 MiB, peak bytes:   12.7 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[157].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[64]:          Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  +- writer_node_3964347345554411107:              Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   12.5 MiB, peak bytes:   12.6 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[64].sink:  Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[88]:          Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  +- writer_node_17501627176929250602:             Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   12.5 MiB, peak bytes:   12.5 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[88].sink:  Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[165]:         Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  +- writer_node_5103532593787709678:              Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   12.5 MiB, peak bytes:   12.5 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[165].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[111]:         Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  +- writer_node_913615336661173562:               Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   12.5 MiB, peak bytes:   12.5 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[111].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[116]:         Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  +- writer_node_16007420479186856992:             Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   12.5 MiB, peak bytes:   12.5 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[116].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[209]:         Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  +- writer_node_13664721293374663391:             Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   12.5 MiB, peak bytes:   12.5 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[209].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[178]:         Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  +- writer_node_7213323637378777817:              Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   12.5 MiB, peak bytes:   12.5 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[178].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[121]:         Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  +- writer_node_5750783937089881792:              Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   12.5 MiB, peak bytes:   12.5 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[121].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[182]:         Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  +- writer_node_12913595327245071541:             Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   12.5 MiB, peak bytes:   12.5 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[182].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[120]:         Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  +- writer_node_8901286779273775421:              Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   12.5 MiB, peak bytes:   12.5 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[120].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  |  +- op.2.0.0.TableWrite.test-hive.part[155]:         Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  +- writer_node_9259036891116703451:              Current used bytes:   12.5 MiB, peak bytes:   13.0 MiB
	      |     |  |  |  |  |  |  \- .general:                                  Current used bytes:   12.5 MiB, peak bytes:   12.5 MiB
	      |     |  |  |  |  |  \- op.2.0.0.TableWrite.test-hive.part[155].sink: Current used bytes:      0.0 B, peak bytes:      0.0 B
......
	      |     |  |  |  \- op.2.0.0.TableWrite:                                Current used bytes:    5.1 MiB, peak bytes:    5.1 MiB
	      |     |  |  +- node.0:                                                Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  \- op.0.0.0.ValueStream:                               Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  \- node.1:                                                Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |     \- op.1.0.0.FilterProject:                             Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  \- default_leaf:                                             Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     \- gluten::MemoryAllocator:                                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- ShuffleReader.7:                                                   Current used bytes:   17.0 MiB, peak bytes:   24.0 MiB
	      |  \- single:                                                         Current used bytes:   17.0 MiB, peak bytes:   24.0 MiB
	      |     +- gluten::MemoryAllocator:                                     Current used bytes:   10.0 MiB, peak bytes:   10.4 MiB
	      |     \- root:                                                        Current used bytes:  384.0 KiB, peak bytes: 1024.0 KiB
	      |        \- default_leaf:                                             Current used bytes:  384.0 KiB, peak bytes:  384.0 KiB
	      +- ArrowContextInstance.14:                                           Current used bytes:    8.0 MiB, peak bytes:    8.0 MiB
	      +- IndicatorVectorBase#init.7:                                        Current used bytes:      0.0 B, peak bytes:    8.0 MiB
	      |  \- single:                                                         Current used bytes:      0.0 B, peak bytes:    8.0 MiB
	      |     +- root:                                                        Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  \- default_leaf:                                             Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     \- gluten::MemoryAllocator:                                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- OverAcquire.DummyTarget.37:                                        Current used bytes:      0.0 B, peak bytes:    2.4 MiB
	      +- OverAcquire.DummyTarget.36:                                        Current used bytes:      0.0 B, peak bytes:    7.2 MiB
	      +- OverAcquire.DummyTarget.35:                                        Current used bytes:      0.0 B, peak bytes:  460.8 MiB
	      \- ArrowContextInstance.15:                                           Current used bytes:      0.0 B, peak bytes:      0.0 B

	at org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget.borrow(ThrowOnOomMemoryTarget.java:105)
	at org.apache.gluten.memory.listener.ManagedReservationListener.reserve(ManagedReservationListener.java:43)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNextInternal(ColumnarBatchOutIterator.java:61)
	at org.apache.gluten.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:37)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
	at org.apache.gluten.utils.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:159)
	at org.apache.gluten.utils.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
	at org.apache.gluten.utils.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
	at org.apache.gluten.utils.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:100)
	at org.apache.spark.sql.execution.VeloxColumnarWriteFilesRDD.$anonfun$compute$2(VeloxColumnarWriteFilesExec.scala:208)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1428)
	at org.apache.spark.sql.execution.VeloxColumnarWriteFilesRDD.compute(VeloxColumnarWriteFilesExec.scala:203)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Retriable: False
Function: runInternal
File: /data/workspace/gluten-deploy-dist/ep/build-velox/build/velox_ep/velox/exec/Driver.cpp
Line: 677
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKSsEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE.cold
# 4  _ZN8facebook5velox4exec6Driver4nextERSt10shared_ptrINS1_13BlockingStateEE
# 5  _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
# 6  _ZN6gluten24WholeStageResultIterator4nextEv
# 7  Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 8  0x00007f9fcb812ce8

	at org.apache.gluten.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:39)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
	at org.apache.gluten.utils.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:159)
	at org.apache.gluten.utils.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
	at org.apache.gluten.utils.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
	at org.apache.gluten.utils.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:100)
	at org.apache.spark.sql.execution.VeloxColumnarWriteFilesRDD.$anonfun$compute$2(VeloxColumnarWriteFilesExec.scala:208)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1428)
	at org.apache.spark.sql.execution.VeloxColumnarWriteFilesRDD.compute(VeloxColumnarWriteFilesExec.scala:203)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

wForget avatar Oct 23 '24 07:10 wForget

So each write node takes >10M memory which caused the OOM. Should we have one write node and keep it open for each partition? @JkSelf

FelixYBW avatar Oct 25 '24 01:10 FelixYBW

The query is using velox writer to write parquet data and we are hitting this error. Spark version: 3.3 Executor Configs: 1 executor per node memoryOverhead: [amount: 1024] cores: [amount: 16] memory: [amount: 13312] offHeap: [amount: 80896] This is how the plan looks like for the stage. Many scan, filter project fallbacks to spark as from_json is not supported yet. There are 23000 tasks spawned for this stage. It is a non partitioned write. Need suggestions on how configurations can be tweaked to make this job work?

image

Executor logs:

I20241216 17:07:15.881199 292073 WholeStageResultIterator.cc:234] Spill[root/root]:  successfully reclaimed total 0B with shrunken 0B and spilled 0B.
I20241216 17:07:15.888181 292073 WholeStageResultIterator.cc:230] Spill[root/root]:  trying to request spill for 8.00MB.
I20241216 17:07:15.888319 292073 WholeStageResultIterator.cc:234] Spill[root/root]:  successfully reclaimed total 0B with shrunken 0B and spilled 0B.
I20241216 17:07:15.888373 292073 WholeStageResultIterator.cc:230] Spill[root/root]:  trying to request spill for 8.00MB.
I20241216 17:07:15.888398 292073 WholeStageResultIterator.cc:234] Spill[root/root]:  successfully reclaimed total 0B with shrunken 0B and spilled 0B.
I20241216 17:07:15.888561 292073 WholeStageResultIterator.cc:230] Spill[root/root]:  trying to request spill for 509.40MB.
I20241216 17:07:15.888585 292073 WholeStageResultIterator.cc:234] Spill[root/root]:  successfully reclaimed total 0B with shrunken 0B and spilled 0B.
I20241216 17:07:15.888605 292073 WholeStageResultIterator.cc:230] Spill[root/root]:  trying to request spill for 509.40MB.
I20241216 17:07:15.888621 292073 WholeStageResultIterator.cc:234] Spill[root/root]:  successfully reclaimed total 0B with shrunken 0B and spilled 0B.
24/12/16 17:07:15 ERROR ManagedReservationListener: Error reserving memory from target
org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException: Not enough spark off-heap execution memory. Acquired: 8.0 MiB, granted: 2.0 MiB. Try tweaking config option spark.memory.offHeap.size to get larger space to run this application (if spark.gluten.memory.dynamic.offHeap.sizing.enabled is not enabled). 
Current config settings: 
	spark.gluten.memory.offHeap.size.in.bytes=75.0 GiB
	spark.gluten.memory.task.offHeap.size.in.bytes=4.7 GiB
	spark.gluten.memory.conservative.task.offHeap.size.in.bytes=2.3 GiB
	spark.memory.offHeap.enabled=true
	spark.gluten.memory.dynamic.offHeap.sizing.enabled=false
Memory consumer stats: 
	Task.28642:                                                 Current used bytes:    4.7 GiB, peak bytes:        N/A
	\- Gluten.Tree.1251:                                        Current used bytes:    4.7 GiB, peak bytes:    4.7 GiB
	   \- root.1251:                                            Current used bytes:    4.7 GiB, peak bytes:    4.7 GiB
	      +- ArrowContextInstance.272:                          Current used bytes:    2.9 GiB, peak bytes:    4.3 GiB
	      +- RowToColumnar.272:                                 Current used bytes: 1696.0 MiB, peak bytes: 1698.0 MiB
	      |  \- single:                                         Current used bytes: 1696.0 MiB, peak bytes: 1696.0 MiB
	      |     +- root:                                        Current used bytes: 1695.8 MiB, peak bytes: 1696.0 MiB
	      |     |  \- default_leaf:                             Current used bytes: 1695.8 MiB, peak bytes: 1695.8 MiB
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- VeloxWriter.272:                                   Current used bytes:  112.0 MiB, peak bytes:  184.0 MiB
	      |  \- single:                                         Current used bytes:  112.0 MiB, peak bytes:  184.0 MiB
	      |     +- root:                                        Current used bytes:  111.5 MiB, peak bytes:  184.0 MiB
	      |     |  +- datasource.272:                           Current used bytes:  111.5 MiB, peak bytes:  184.0 MiB
	      |     |  |  \- .general:                              Current used bytes:  111.5 MiB, peak bytes:  176.7 MiB
	      |     |  \- default_leaf:                             Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- ColumnarToRow.393:                                 Current used bytes:   64.0 MiB, peak bytes:   64.0 MiB
	      |  \- single:                                         Current used bytes:   64.0 MiB, peak bytes:   64.0 MiB
	      |     +- root:                                        Current used bytes:   64.0 MiB, peak bytes:   64.0 MiB
	      |     |  \- default_leaf:                             Current used bytes:   64.0 MiB, peak bytes:   64.0 MiB
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- NativePlanEvaluator-1332.0:                        Current used bytes:    4.0 MiB, peak bytes:   16.0 MiB
	      |  \- single:                                         Current used bytes:    4.0 MiB, peak bytes:   16.0 MiB
	      |     +- root:                                        Current used bytes:  900.9 KiB, peak bytes:   15.0 MiB
	      |     |  +- task.Gluten_Stage_26_TID_28642_VTID_1332: Current used bytes:  899.4 KiB, peak bytes:   14.0 MiB
	      |     |  |  +- node.1:                                Current used bytes:  482.5 KiB, peak bytes:    2.0 MiB
	      |     |  |  |  \- op.1.0.0.FilterProject:             Current used bytes:  482.5 KiB, peak bytes: 1443.3 KiB
	      |     |  |  +- node.3:                                Current used bytes:  294.4 KiB, peak bytes:   11.0 MiB
	      |     |  |  |  \- op.3.0.0.FilterProject:             Current used bytes:  294.4 KiB, peak bytes:   10.9 MiB
	      |     |  |  +- node.2:                                Current used bytes:  122.5 KiB, peak bytes: 1024.0 KiB
	      |     |  |  |  \- op.2.0.0.FilterProject:             Current used bytes:  122.5 KiB, peak bytes:  380.0 KiB
	      |     |  |  \- node.0:                                Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |     \- op.0.0.0.ValueStream:               Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  \- default_leaf:                             Current used bytes:   1536.0 B, peak bytes:   1664.0 B
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- NativePlanEvaluator-1331.0:                        Current used bytes:    2.0 MiB, peak bytes:    8.0 MiB
	      |  \- single:                                         Current used bytes:    2.0 MiB, peak bytes:    8.0 MiB
	      |     +- root:                                        Current used bytes:  120.0 KiB, peak bytes:    2.0 MiB
	      |     |  +- task.Gluten_Stage_26_TID_28642_VTID_1331: Current used bytes:  120.0 KiB, peak bytes:    2.0 MiB
	      |     |  |  +- node.1:                                Current used bytes:   96.0 KiB, peak bytes: 1024.0 KiB
	      |     |  |  |  \- op.1.0.0.Unnest:                    Current used bytes:   96.0 KiB, peak bytes:   96.0 KiB
	      |     |  |  +- node.2:                                Current used bytes:   24.0 KiB, peak bytes: 1024.0 KiB
	      |     |  |  |  \- op.2.0.0.FilterProject:             Current used bytes:   24.0 KiB, peak bytes:   24.0 KiB
	      |     |  |  \- node.0:                                Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |     \- op.0.0.0.ValueStream:               Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  \- default_leaf:                             Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- IteratorMetrics.1155.OverAcquire.0:                Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- VeloxWriter.272.OverAcquire.0:                     Current used bytes:      0.0 B, peak bytes:   55.2 MiB
	      +- RowToColumnar.272.OverAcquire.0:                   Current used bytes:      0.0 B, peak bytes:  391.2 MiB
	      +- NativePlanEvaluator-1331.0.OverAcquire.0:          Current used bytes:      0.0 B, peak bytes:    2.4 MiB
	      +- IndicatorVectorBase#init.1251.OverAcquire.0:       Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- ColumnarToRow.393.OverAcquire.0:                   Current used bytes:      0.0 B, peak bytes:   19.2 MiB
	      +- IteratorMetrics.1155:                              Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |  \- single:                                         Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     +- root:                                        Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  \- default_leaf:                             Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- NativePlanEvaluator-1332.0.OverAcquire.0:          Current used bytes:      0.0 B, peak bytes:    4.8 MiB
	      \- IndicatorVectorBase#init.1251:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	         \- single:                                         Current used bytes:      0.0 B, peak bytes:      0.0 B
	            +- root:                                        Current used bytes:      0.0 B, peak bytes:      0.0 B
	            |  \- default_leaf:                             Current used bytes:      0.0 B, peak bytes:      0.0 B
	            \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B

	at org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget.borrow(ThrowOnOomMemoryTarget.java:105) 
	at org.apache.gluten.memory.listener.ManagedReservationListener.reserve(ManagedReservationListener.java:49) 
	at org.apache.gluten.vectorized.NativeRowToColumnarJniWrapper.nativeConvertRowToColumnar(Native Method) 

Task logs:
org.apache.spark.SparkException: Task failed while writing rows.
	at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:654)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:448)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$22(FileFormatWriter.scala:346)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1505)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.gluten.exception.GlutenException: org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Operator::getOutput failed for [operator: ValueStream, plan node ID: 0]: Error during calling Java code from native code: org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException: Not enough spark off-heap execution memory. Acquired: 3.9 GiB, granted: 2.7 GiB. Try tweaking config option spark.memory.offHeap.size to get larger space to run this application (if spark.gluten.memory.dynamic.offHeap.sizing.enabled is not enabled). 
Current config settings: 
	spark.gluten.memory.offHeap.size.in.bytes=79.0 GiB
	spark.gluten.memory.task.offHeap.size.in.bytes=4.9 GiB
	spark.gluten.memory.conservative.task.offHeap.size.in.bytes=2.5 GiB
	spark.memory.offHeap.enabled=true
	spark.gluten.memory.dynamic.offHeap.sizing.enabled=false
Memory consumer stats: 
	Task.28426:                                                 Current used bytes:    2.3 GiB, peak bytes:        N/A
	\- Gluten.Tree.1304:                                        Current used bytes:    2.3 GiB, peak bytes:    4.9 GiB
	   \- root.1304:                                            Current used bytes:    2.3 GiB, peak bytes:    4.9 GiB
	      +- ArrowContextInstance.276:                          Current used bytes: 2000.0 MiB, peak bytes:    4.6 GiB
	      +- RowToColumnar.276:                                 Current used bytes:  152.0 MiB, peak bytes: 1952.0 MiB
	      |  \- single:                                         Current used bytes:  152.0 MiB, peak bytes: 1952.0 MiB
	      |     +- root:                                        Current used bytes:  151.6 MiB, peak bytes: 1952.0 MiB
	      |     |  \- default_leaf:                             Current used bytes:  151.6 MiB, peak bytes: 1950.3 MiB
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- VeloxWriter.276:                                   Current used bytes:   88.0 MiB, peak bytes:  176.0 MiB
	      |  \- single:                                         Current used bytes:   88.0 MiB, peak bytes:  176.0 MiB
	      |     +- root:                                        Current used bytes:   82.2 MiB, peak bytes:  176.0 MiB
	      |     |  +- datasource.276:                           Current used bytes:   82.2 MiB, peak bytes:  176.0 MiB
	      |     |  |  \- .general:                              Current used bytes:   82.2 MiB, peak bytes:  175.8 MiB
	      |     |  \- default_leaf:                             Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- ColumnarToRow.413:                                 Current used bytes:   64.0 MiB, peak bytes:   64.0 MiB
	      |  \- single:                                         Current used bytes:   64.0 MiB, peak bytes:   64.0 MiB
	      |     +- root:                                        Current used bytes:   64.0 MiB, peak bytes:   64.0 MiB
	      |     |  \- default_leaf:                             Current used bytes:   64.0 MiB, peak bytes:   64.0 MiB
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- NativePlanEvaluator-1381.0:                        Current used bytes:    4.0 MiB, peak bytes:   16.0 MiB
	      |  \- single:                                         Current used bytes:    4.0 MiB, peak bytes:   16.0 MiB
	      |     +- root:                                        Current used bytes:  722.1 KiB, peak bytes:   16.0 MiB
	      |     |  +- task.Gluten_Stage_26_TID_28426_VTID_1381: Current used bytes:  720.6 KiB, peak bytes:   15.0 MiB
	      |     |  |  +- node.1:                                Current used bytes:  369.0 KiB, peak bytes:    2.0 MiB
	      |     |  |  |  \- op.1.0.0.FilterProject:             Current used bytes:  369.0 KiB, peak bytes: 1523.0 KiB
	      |     |  |  +- node.3:                                Current used bytes:  229.1 KiB, peak bytes:   12.0 MiB
	      |     |  |  |  \- op.3.0.0.FilterProject:             Current used bytes:  229.1 KiB, peak bytes:   11.2 MiB
	      |     |  |  +- node.2:                                Current used bytes:  122.5 KiB, peak bytes: 1024.0 KiB
	      |     |  |  |  \- op.2.0.0.FilterProject:             Current used bytes:  122.5 KiB, peak bytes:  380.0 KiB
	      |     |  |  \- node.0:                                Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |     \- op.0.0.0.ValueStream:               Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  \- default_leaf:                             Current used bytes:   1536.0 B, peak bytes:   1664.0 B
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- NativePlanEvaluator-1380.0:                        Current used bytes:    2.0 MiB, peak bytes:    8.0 MiB
	      |  \- single:                                         Current used bytes:    2.0 MiB, peak bytes:    8.0 MiB
	      |     +- root:                                        Current used bytes:  120.0 KiB, peak bytes:    2.0 MiB
	      |     |  +- task.Gluten_Stage_26_TID_28426_VTID_1380: Current used bytes:  120.0 KiB, peak bytes:    2.0 MiB
	      |     |  |  +- node.1:                                Current used bytes:   96.0 KiB, peak bytes: 1024.0 KiB
	      |     |  |  |  \- op.1.0.0.Unnest:                    Current used bytes:   96.0 KiB, peak bytes:   96.0 KiB
	      |     |  |  +- node.2:                                Current used bytes:   24.0 KiB, peak bytes: 1024.0 KiB
	      |     |  |  |  \- op.2.0.0.FilterProject:             Current used bytes:   24.0 KiB, peak bytes:   24.0 KiB
	      |     |  |  \- node.0:                                Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |     \- op.0.0.0.ValueStream:               Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  \- default_leaf:                             Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- IndicatorVectorBase#init.1304:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |  \- single:                                         Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     +- root:                                        Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  \- default_leaf:                             Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- VeloxWriter.276.OverAcquire.0:                     Current used bytes:      0.0 B, peak bytes:   52.8 MiB
	      +- NativePlanEvaluator-1381.0.OverAcquire.0:          Current used bytes:      0.0 B, peak bytes:    4.8 MiB
	      +- IteratorMetrics.1204.OverAcquire.0:                Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- ColumnarToRow.413.OverAcquire.0:                   Current used bytes:      0.0 B, peak bytes:   19.2 MiB
	      +- RowToColumnar.276.OverAcquire.0:                   Current used bytes:      0.0 B, peak bytes:  585.6 MiB
	      +- IteratorMetrics.1204:                              Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |  \- single:                                         Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     +- root:                                        Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  \- default_leaf:                             Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- IndicatorVectorBase#init.1304.OverAcquire.0:       Current used bytes:      0.0 B, peak bytes:      0.0 B
	      \- NativePlanEvaluator-1380.0.OverAcquire.0:          Current used bytes:      0.0 B, peak bytes:    2.4 MiB

	at org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget.borrow(ThrowOnOomMemoryTarget.java:105)
	at org.apache.gluten.memory.arrow.alloc.ManagedAllocationListener.onPreAllocation(ManagedAllocationListener.java:61)
	at org.apache.gluten.shaded.org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:300)
	at org.apache.gluten.shaded.org.apache.arrow.memory.RootAllocator.buffer(RootAllocator.java:29)
	at org.apache.gluten.shaded.org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:280)
	at org.apache.gluten.shaded.org.apache.arrow.memory.RootAllocator.buffer(RootAllocator.java:29)
	at org.apache.gluten.execution.RowToVeloxColumnarExec$$anon$1.next(RowToVeloxColumnarExec.scala:200)
	at org.apache.gluten.execution.RowToVeloxColumnarExec$$anon$1.next(RowToVeloxColumnarExec.scala:138)
	at org.apache.gluten.iterator.IteratorsV1$InvocationFlowProtection.next(IteratorsV1.scala:178)
	at org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.next(IteratorsV1.scala:79)
	at org.apache.gluten.iterator.IteratorsV1$PayloadCloser.next(IteratorsV1.scala:41)
	at scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:33)
	at org.apache.gluten.vectorized.ColumnarBatchInIterator.next(ColumnarBatchInIterator.java:39)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:57)
	at org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:39)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
	at org.apache.gluten.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:159)
	at org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
	at org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
	at org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:100)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:95)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:429)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1539)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:438)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$22(FileFormatWriter.scala:346)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1505)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

Retriable: False
Function: operator()
File: /home/abc/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/Driver.cpp
Line: 601
Stack trace:
0  _ZN8facebook5velox7process10StackTraceC1Ei
1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
3  _ZZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEEENKUlvE3_clEv.cold
4  _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
 5  _ZN8facebook5velox4exec6Driver4nextEPN5folly10SemiFutureINS3_4UnitEEE
 6  _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
 7  _ZN6gluten24WholeStageResultIterator4nextEv
 8  Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
 9  0x00007f36695f9a7b

	at org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:41)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
	at org.apache.gluten.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:159)
	at org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
	at org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
	at org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:100)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:95)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:429)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1539)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:438)
	... 9 more
Caused by: org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Operator::getOutput failed for [operator: ValueStream, plan node ID: 0]: Error during calling Java code from native code: org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException: Not enough spark off-heap execution memory. Acquired: 3.9 GiB, granted: 2.7 GiB. Try tweaking config option spark.memory.offHeap.size to get larger space to run this application (if spark.gluten.memory.dynamic.offHeap.sizing.enabled is not enabled). 
Current config settings: 
	spark.gluten.memory.offHeap.size.in.bytes=79.0 GiB
	spark.gluten.memory.task.offHeap.size.in.bytes=4.9 GiB
	spark.gluten.memory.conservative.task.offHeap.size.in.bytes=2.5 GiB
	spark.memory.offHeap.enabled=true
	spark.gluten.memory.dynamic.offHeap.sizing.enabled=false
Memory consumer stats: 
	Task.28426:                                                 Current used bytes:    2.3 GiB, peak bytes:        N/A
	\- Gluten.Tree.1304:                                        Current used bytes:    2.3 GiB, peak bytes:    4.9 GiB
	   \- root.1304:                                            Current used bytes:    2.3 GiB, peak bytes:    4.9 GiB
	      +- ArrowContextInstance.276:                          Current used bytes: 2000.0 MiB, peak bytes:    4.6 GiB
	      +- RowToColumnar.276:                                 Current used bytes:  152.0 MiB, peak bytes: 1952.0 MiB
	      |  \- single:                                         Current used bytes:  152.0 MiB, peak bytes: 1952.0 MiB
	      |     +- root:                                        Current used bytes:  151.6 MiB, peak bytes: 1952.0 MiB
	      |     |  \- default_leaf:                             Current used bytes:  151.6 MiB, peak bytes: 1950.3 MiB
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- VeloxWriter.276:                                   Current used bytes:   88.0 MiB, peak bytes:  176.0 MiB
	      |  \- single:                                         Current used bytes:   88.0 MiB, peak bytes:  176.0 MiB
	      |     +- root:                                        Current used bytes:   82.2 MiB, peak bytes:  176.0 MiB
	      |     |  +- datasource.276:                           Current used bytes:   82.2 MiB, peak bytes:  176.0 MiB
	      |     |  |  \- .general:                              Current used bytes:   82.2 MiB, peak bytes:  175.8 MiB
	      |     |  \- default_leaf:                             Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- ColumnarToRow.413:                                 Current used bytes:   64.0 MiB, peak bytes:   64.0 MiB
	      |  \- single:                                         Current used bytes:   64.0 MiB, peak bytes:   64.0 MiB
	      |     +- root:                                        Current used bytes:   64.0 MiB, peak bytes:   64.0 MiB
	      |     |  \- default_leaf:                             Current used bytes:   64.0 MiB, peak bytes:   64.0 MiB
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- NativePlanEvaluator-1381.0:                        Current used bytes:    4.0 MiB, peak bytes:   16.0 MiB
	      |  \- single:                                         Current used bytes:    4.0 MiB, peak bytes:   16.0 MiB
	      |     +- root:                                        Current used bytes:  722.1 KiB, peak bytes:   16.0 MiB
	      |     |  +- task.Gluten_Stage_26_TID_28426_VTID_1381: Current used bytes:  720.6 KiB, peak bytes:   15.0 MiB
	      |     |  |  +- node.1:                                Current used bytes:  369.0 KiB, peak bytes:    2.0 MiB
	      |     |  |  |  \- op.1.0.0.FilterProject:             Current used bytes:  369.0 KiB, peak bytes: 1523.0 KiB
	      |     |  |  +- node.3:                                Current used bytes:  229.1 KiB, peak bytes:   12.0 MiB
	      |     |  |  |  \- op.3.0.0.FilterProject:             Current used bytes:  229.1 KiB, peak bytes:   11.2 MiB
	      |     |  |  +- node.2:                                Current used bytes:  122.5 KiB, peak bytes: 1024.0 KiB
	      |     |  |  |  \- op.2.0.0.FilterProject:             Current used bytes:  122.5 KiB, peak bytes:  380.0 KiB
	      |     |  |  \- node.0:                                Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |     \- op.0.0.0.ValueStream:               Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  \- default_leaf:                             Current used bytes:   1536.0 B, peak bytes:   1664.0 B
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- NativePlanEvaluator-1380.0:                        Current used bytes:    2.0 MiB, peak bytes:    8.0 MiB
	      |  \- single:                                         Current used bytes:    2.0 MiB, peak bytes:    8.0 MiB
	      |     +- root:                                        Current used bytes:  120.0 KiB, peak bytes:    2.0 MiB
	      |     |  +- task.Gluten_Stage_26_TID_28426_VTID_1380: Current used bytes:  120.0 KiB, peak bytes:    2.0 MiB
	      |     |  |  +- node.1:                                Current used bytes:   96.0 KiB, peak bytes: 1024.0 KiB
	      |     |  |  |  \- op.1.0.0.Unnest:                    Current used bytes:   96.0 KiB, peak bytes:   96.0 KiB
	      |     |  |  +- node.2:                                Current used bytes:   24.0 KiB, peak bytes: 1024.0 KiB
	      |     |  |  |  \- op.2.0.0.FilterProject:             Current used bytes:   24.0 KiB, peak bytes:   24.0 KiB
	      |     |  |  \- node.0:                                Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |     \- op.0.0.0.ValueStream:               Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  \- default_leaf:                             Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- IndicatorVectorBase#init.1304:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |  \- single:                                         Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     +- root:                                        Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  \- default_leaf:                             Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- VeloxWriter.276.OverAcquire.0:                     Current used bytes:      0.0 B, peak bytes:   52.8 MiB
	      +- NativePlanEvaluator-1381.0.OverAcquire.0:          Current used bytes:      0.0 B, peak bytes:    4.8 MiB
	      +- IteratorMetrics.1204.OverAcquire.0:                Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- ColumnarToRow.413.OverAcquire.0:                   Current used bytes:      0.0 B, peak bytes:   19.2 MiB
	      +- RowToColumnar.276.OverAcquire.0:                   Current used bytes:      0.0 B, peak bytes:  585.6 MiB
	      +- IteratorMetrics.1204:                              Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |  \- single:                                         Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     +- root:                                        Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  \- default_leaf:                             Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     \- gluten::MemoryAllocator:                     Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- IndicatorVectorBase#init.1304.OverAcquire.0:       Current used bytes:      0.0 B, peak bytes:      0.0 B
	      \- NativePlanEvaluator-1380.0.OverAcquire.0:          Current used bytes:      0.0 B, peak bytes:    2.4 MiB

	at org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget.borrow(ThrowOnOomMemoryTarget.java:105)
	at org.apache.gluten.memory.arrow.alloc.ManagedAllocationListener.onPreAllocation(ManagedAllocationListener.java:61)
	at org.apache.gluten.shaded.org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:300)
	at org.apache.gluten.shaded.org.apache.arrow.memory.RootAllocator.buffer(RootAllocator.java:29)
	at org.apache.gluten.shaded.org.apache.arrow.memory.BaseAllocator.buffer(BaseAllocator.java:280)
	at org.apache.gluten.shaded.org.apache.arrow.memory.RootAllocator.buffer(RootAllocator.java:29)
	at org.apache.gluten.execution.RowToVeloxColumnarExec$$anon$1.next(RowToVeloxColumnarExec.scala:200)
	at org.apache.gluten.execution.RowToVeloxColumnarExec$$anon$1.next(RowToVeloxColumnarExec.scala:138)
	at org.apache.gluten.iterator.IteratorsV1$InvocationFlowProtection.next(IteratorsV1.scala:178)
	at org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.next(IteratorsV1.scala:79)
	at org.apache.gluten.iterator.IteratorsV1$PayloadCloser.next(IteratorsV1.scala:41)
	at scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:33)
	at org.apache.gluten.vectorized.ColumnarBatchInIterator.next(ColumnarBatchInIterator.java:39)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:57)
	at org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:39)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
	at org.apache.gluten.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:159)
	at org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
	at org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
	at org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:100)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:95)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:429)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1539)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:438)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$22(FileFormatWriter.scala:346)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1505)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

Retriable: False
Function: operator()
File: /home/abc/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/Driver.cpp
Line: 601
Stack trace:
 0  _ZN8facebook5velox7process10StackTraceC1Ei
 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
 3  _ZZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEEENKUlvE3_clEv.cold
 4  _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
 5  _ZN8facebook5velox4exec6Driver4nextEPN5folly10SemiFutureINS3_4UnitEEE
 6  _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
 7  _ZN6gluten24WholeStageResultIterator4nextEv
 8  Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
 9  0x00007f36695f9a7b

	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:57)
	at org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:39)
	... 19 more

ayushi-agarwal avatar Dec 16 '24 17:12 ayushi-agarwal

It's not the same issue. Here your memory is allocated by ArrowContextInstance and R2C. What's your reducer#?

FelixYBW avatar Dec 16 '24 19:12 FelixYBW

It's not the same issue. Here your memory is allocated by ArrowContextInstance and R2C. What's your reducer#?

Sorry, I didn't understand your question, are you asking for number of reducers? This query had a single stage with 23663 tasks, where each task does a union of data from 7 different branches and writes the results to a storage location.

ayushi-agarwal avatar Dec 23 '24 04:12 ayushi-agarwal

Sorry, I didn't understand your question, are you asking for number of reducers? This query had a single stage with 23663 tasks, where each task does a union of data from 7 different branches and writes the results to a storage location.

Your error msg shows the memory is occupied by ArrowContextInstance, it's used by shuffle and velox to arrow converter in parquet writer. If it's in shuffle you may try sort based shuffle. If it's in parquet write, it may because the arrow batch size is too large, may because too many rows in the batch, or too large data size in each row. You may check the batch size in UI.

FelixYBW avatar Jan 04 '25 00:01 FelixYBW

@wForget Is the issue still there in your side? looks not fixed.

FelixYBW avatar Jan 04 '25 00:01 FelixYBW

@wForget Is the issue still there in your side? looks not fixed.

Yes, this issue still exists, but we can avoid it by kyuubi spark sql extension (InsertRebalanceBeforeWrite and FinalStageConfigIsolation optimizers).

InsertRebalanceBeforeWrite optimized plan like this:

    QueryPlans
         |
  RebalanceByColumn(part columns)
         |
   WriteFileExec

Then, we disable coalescePartitions for the final write stage:

set spark.sql.finalStage.adaptive.coalescePartitions.enabled=false;

After that, different hive partitions will be distributed in different tasks, and we can avoid OOM caused by one velox task writing too many hive partitions.

wForget avatar Apr 28 '25 07:04 wForget