incubator-gluten icon indicating copy to clipboard operation
incubator-gluten copied to clipboard

A metrics-related error running DEBUG build

Open zhztheplayer opened this issue 3 years ago • 3 comments

Got following error while running DEBUG build of Gluten:

java.lang.ArrayIndexOutOfBoundsException: 1
        at io.glutenproject.vectorized.Metrics.getOperatorMetrics(Metrics.java:64)
        at io.glutenproject.execution.WholeStageTransformerExec.$anonfun$updateTransformerMetrics$1(WholeStageTransformerExec.scala:440)
        at java.util.ArrayList.forEach(ArrayList.java:1259)
        at io.glutenproject.execution.WholeStageTransformerExec.updateTransformerMetrics(WholeStageTransformerExec.scala:439)
        at io.glutenproject.execution.WholeStageTransformerExec.updateNativeMetrics(WholeStageTransformerExec.scala:541)
        at io.glutenproject.execution.WholeStageTransformerExec.$anonfun$doExecuteColumnar$6(WholeStageTransformerExec.scala:290)
        at io.glutenproject.execution.WholeStageTransformerExec.$anonfun$doExecuteColumnar$6$adapted(WholeStageTransformerExec.scala:285)
        at io.glutenproject.backendsapi.velox.VeloxIteratorApi$$anon$2.hasNext(VeloxIteratorApi.scala:241)
        at io.glutenproject.vectorized.CloseableColumnBatchIterator.hasNext(CloseableColumnBatchIterator.scala:40)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at io.glutenproject.backendsapi.velox.VeloxSparkPlanExecApi.$anonfun$createBroadcastRelation$1(VeloxSparkPlanExecApi.scala:181)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

@rui-mo Would you please help me to figure out the reason? Thanks.

zhztheplayer avatar Jul 28 '22 09:07 zhztheplayer

Hi Hongze, which query are you testing?

rui-mo avatar Jul 28 '22 10:07 rui-mo

Just ran Q1. Seems that a "pure virtual function call" was also reported in the case. Not sure how they are related.

zhztheplayer avatar Jul 28 '22 12:07 zhztheplayer

Hi Hongze, I tired debug build, but only found below error for q1. For the metrics issue, not sure this PR can solve the error you met.

22/07/29 10:58:31,987 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 8.1 in stage 0.0 (TID 385) (sr245 executor 7): java.lang.RuntimeException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Write past Buffer capacity() 0
Retriable: False
Context: Split [file file:///mnt/DP_disk1/tpch_sf1t_dwrf/lineitem/part-00042-c280fa50-dcc6-43d4-be8b-01bf72f57471-c000.snappy.parquet 0 - 884489986] Task gluten task 17
Top-Level Context: Same as context.
Function: checkEndGuardImpl
File: /home/sparkuser/workspace/rui/gluten/tools/build/velox_ep/velox/buffer/Buffer.h
Line: 501
Stack trace:

rui-mo avatar Jul 29 '22 03:07 rui-mo