kyuubi icon indicating copy to clipboard operation
kyuubi copied to clipboard

[Bug] Error in Executor setup After Permanent UDF Deletion

Open hzxiongyinke opened this issue 1 year ago • 3 comments

Code of Conduct

Search before asking

  • [X] I have searched in the issues and found no similar issues.

Describe the bug

Hello everyone,

I've encountered an issue with Kyuubi that I'm hoping the community can help with.

I created a permanent UDF in a Kyuubi instance, and later, due to requirement changes, I deleted this UDF through another driver. However, any SQL executed by the previously initiated driver now results in an error, indicating that the UDF cannot be found. Currently, the only solution I have is to restart the Kyuubi engine.

Affects Version(s)

1.8.0

Kyuubi Server Log Output

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 279.0 failed 4 times, most recent failure: Lost task 0.3 in stage 279.0 (TID 251) (core-xxxx.cn-shanghai.emr.aliyuncs.com executor 29): java.io.FileNotFoundException:  [ErrorMessage]: File not found: .GalaxyResource/bigdata_emr_sh/xxx in bucket xxx
        at com.aliyun.jindodata.api.spec.JdoNativeResult.get(JdoNativeResult.java:54)
        at com.aliyun.jindodata.api.spec.protos.coder.JdolistDirectoryReplyDecoder.decode(JdolistDirectoryReplyDecoder.java:23)
        at com.aliyun.jindodata.api.JindoCommonApis.listDirectory(JindoCommonApis.java:112)
        at com.aliyun.jindodata.call.JindoListCall.execute(JindoListCall.java:65)
        at com.aliyun.jindodata.common.JindoHadoopSystem.listStatus(JindoHadoopSystem.java:665)
        at com.aliyun.jindodata.common.JindoHadoopSystem.listStatus(JindoHadoopSystem.java:60)
        at org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:851)
        at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:820)
        at org.apache.spark.util.Utils$.fetchFile(Utils.scala:544)
        at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13(Executor.scala:1010)
        at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13$adapted(Executor.scala:1002)
        at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)
        at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
        at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
        at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984)
        at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:1002)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:506)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2673)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2609)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2608)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2608)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1182)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1182)
        at scala.Option.foreach(Option.scala:407)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1182)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2861)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2803)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2792)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2241)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:269)

Kyuubi Engine Log Output

spark executor log error:
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] Executor: Running task 0.0 in stage 283.0 (TID 264)
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] Executor: Fetching oss://xxx/xxxx/xxx with timestamp 1731900662592
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] HadoopLoginUserInfo: TOKEN: YARN_AM_RM_TOKEN
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] HadoopLoginUserInfo: User: xxxx, authMethod: SIMPLE, ugi: xxxx (auth:SIMPLE)
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] JindoHadoopSystem: Initialized native file system: 
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] FsStats: cmd=getFileStatus, src=oss://xxxx/.xxxx/xxx/xxxx, dst=null, size=0, parameter=null, time-in-ms=77, version=6.2.0
24/11/18 16:06:18 INFO [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] FsStats: cmd=list, src=oss://xxxx/.xxxx/xxxx/xxxx, dst=null, size=0, parameter=null, time-in-ms=26, version=6.2.0
24/11/18 16:06:18 ERROR [Executor task launch worker for task 0.0 in stage 283.0 (TID 264)] Executor: Exception in task 0.0 in stage 283.0 (TID 264)
java.io.FileNotFoundException:  [ErrorMessage]: File not found: .xxxx/xxxx/xxxx in bucket xxxx
	at com.aliyun.jindodata.api.spec.JdoNativeResult.get(JdoNativeResult.java:54) ~[jindo-core-6.2.0.jar:?]
	at com.aliyun.jindodata.api.spec.protos.coder.JdolistDirectoryReplyDecoder.decode(JdolistDirectoryReplyDecoder.java:23) ~[jindo-core-6.2.0.jar:?]
	at com.aliyun.jindodata.api.JindoCommonApis.listDirectory(JindoCommonApis.java:112) ~[jindo-core-6.2.0.jar:?]
	at com.aliyun.jindodata.call.JindoListCall.execute(JindoListCall.java:65) ~[jindo-sdk-6.2.0.jar:?]
	at com.aliyun.jindodata.common.JindoHadoopSystem.listStatus(JindoHadoopSystem.java:665) ~[jindo-sdk-6.2.0.jar:?]
	at com.aliyun.jindodata.common.JindoHadoopSystem.listStatus(JindoHadoopSystem.java:60) ~[jindo-sdk-6.2.0.jar:?]
	at org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:851) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
	at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:820) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
	at org.apache.spark.util.Utils$.fetchFile(Utils.scala:544) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
	at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13(Executor.scala:1010) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
	at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13$adapted(Executor.scala:1002) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
	at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985) ~[scala-library-2.12.15.jar:?]
	at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) ~[scala-library-2.12.15.jar:?]
	at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) ~[scala-library-2.12.15.jar:?]
	at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) ~[scala-library-2.12.15.jar:?]
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) ~[scala-library-2.12.15.jar:?]
	at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) ~[scala-library-2.12.15.jar:?]
	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984) ~[scala-library-2.12.15.jar:?]
	at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:1002) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:506) ~[spark-core_2.12-3.3.1-dw1.2.10.jar:3.3.1-dw1.2.10]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_392]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_392]
	at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_392]
24/11/18 16:06:18 INFO [dispatcher-Executor] YarnCoarseGrainedExecutorBackend: Got assigned task 265

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • [ ] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • [X] No. I cannot submit a PR at this time.

hzxiongyinke avatar Nov 18 '24 10:11 hzxiongyinke

Hello @hzxiongyinke, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi.

github-actions[bot] avatar Nov 18 '24 10:11 github-actions[bot]

cc @yaooqinn @pan3793

hzxiongyinke avatar Nov 18 '24 12:11 hzxiongyinke

The failed Spark application didn't even access the missing jar file, did it?

yaooqinn avatar Nov 19 '24 03:11 yaooqinn