spark
spark copied to clipboard
[BUG]:
Describe the bug
There some exceptions after running spark-submit.
[Error] [JvmBridge] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 9.0 failed 1 times, most recent failure: Lost task 0.0 in stage 9.0 (TID 8) (192.168.1.12 executor driver): org.apache.spark.api.python.PythonException: System.IO.FileNotFoundException: Assembly 'Microsoft.Spark.CSharp.Examples, Version=2.1.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51' file not found 'Microsoft.Spark.CSharp.Examples.dll'
at Microsoft.Spark.Utils.UdfSerDe.<>c.<DeserializeType>b__10_0(TypeData td) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark/Utils/UdfSerDe.cs:line 277
at System.Collections.Concurrent.ConcurrentDictionary2.GetOrAdd(TKey key, Func
2 valueFactory)
at Microsoft.Spark.Utils.UdfSerDe.DeserializeType(TypeData typeData) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark/Utils/UdfSerDe.cs:line 261
at Microsoft.Spark.Utils.UdfSerDe.Deserialize(UdfData udfData) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark/Utils/UdfSerDe.cs:line 160
at Microsoft.Spark.Utils.CommandSerDe.DeserializeUdfs[T](UdfWrapperData data, Int32& nodeIndex, Int32& udfIndex) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark/Utils/CommandSerDe.cs:line 335
at Microsoft.Spark.Utils.CommandSerDe.Deserialize[T](Stream stream, SerializedMode& serializerMode, SerializedMode& deserializerMode, String& runMode) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark/Utils/CommandSerDe.cs:line 313
at Microsoft.Spark.Worker.Processor.CommandProcessor.ReadSqlCommands(PythonEvalType evalType, Stream stream) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark.Worker/Processor/CommandProcessor.cs:line 196
at Microsoft.Spark.Worker.Processor.CommandProcessor.SqlCommandProcessorV2_4_X.Process(PythonEvalType evalType, Stream stream) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark.Worker/Processor/CommandProcessor.cs:line 246
at Microsoft.Spark.Worker.Processor.CommandProcessor.ReadSqlCommands(PythonEvalType evalType, Stream stream, Version version) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark.Worker/Processor/CommandProcessor.cs:line 101
at Microsoft.Spark.Worker.Processor.CommandProcessor.Process(Stream stream) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark.Worker/Processor/CommandProcessor.cs:line 43
at Microsoft.Spark.Worker.Processor.PayloadProcessor.Process(Stream stream) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark.Worker/Processor/PayloadProcessor.cs:line 83
at Microsoft.Spark.Worker.TaskRunner.ProcessStream(Stream inputStream, Stream outputStream, Version version, Boolean& readComplete) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark.Worker/TaskRunner.cs:line 144
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:555)
at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$2.read(PythonUDFRunner.scala:86)
at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$2.read(PythonUDFRunner.scala:68)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:508)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:349)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2454)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2403)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2402)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2402)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1160)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1160)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1160)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2642)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2584)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2573)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:938)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2214)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2235)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2254)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:476)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:429)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:48)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3715)
at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2728)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3706)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3704)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2728)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2935)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:287)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:326)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.api.dotnet.DotnetBackendHandler.handleMethodCall(DotnetBackendHandler.scala:165)
at org.apache.spark.api.dotnet.DotnetBackendHandler.$anonfun$handleBackendRequest$2(DotnetBackendHandler.scala:105)
at org.apache.spark.api.dotnet.ThreadPool$$anon$1.run(ThreadPool.scala:34)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.api.python.PythonException: System.IO.FileNotFoundException: Assembly 'Microsoft.Spark.CSharp.Examples, Version=2.1.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51' file not found 'Microsoft.Spark.CSharp.Examples.dll'
at Microsoft.Spark.Utils.UdfSerDe.<>c.<DeserializeType>b__10_0(TypeData td) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark/Utils/UdfSerDe.cs:line 277
at System.Collections.Concurrent.ConcurrentDictionary2.GetOrAdd(TKey key, Func
2 valueFactory)
at Microsoft.Spark.Utils.UdfSerDe.DeserializeType(TypeData typeData) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark/Utils/UdfSerDe.cs:line 261
at Microsoft.Spark.Utils.UdfSerDe.Deserialize(UdfData udfData) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark/Utils/UdfSerDe.cs:line 160
at Microsoft.Spark.Utils.CommandSerDe.DeserializeUdfs[T](UdfWrapperData data, Int32& nodeIndex, Int32& udfIndex) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark/Utils/CommandSerDe.cs:line 335
at Microsoft.Spark.Utils.CommandSerDe.Deserialize[T](Stream stream, SerializedMode& serializerMode, SerializedMode& deserializerMode, String& runMode) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark/Utils/CommandSerDe.cs:line 313
at Microsoft.Spark.Worker.Processor.CommandProcessor.ReadSqlCommands(PythonEvalType evalType, Stream stream) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark.Worker/Processor/CommandProcessor.cs:line 196
at Microsoft.Spark.Worker.Processor.CommandProcessor.SqlCommandProcessorV2_4_X.Process(PythonEvalType evalType, Stream stream) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark.Worker/Processor/CommandProcessor.cs:line 246
at Microsoft.Spark.Worker.Processor.CommandProcessor.ReadSqlCommands(PythonEvalType evalType, Stream stream, Version version) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark.Worker/Processor/CommandProcessor.cs:line 101
at Microsoft.Spark.Worker.Processor.CommandProcessor.Process(Stream stream) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark.Worker/Processor/CommandProcessor.cs:line 43
at Microsoft.Spark.Worker.Processor.PayloadProcessor.Process(Stream stream) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark.Worker/Processor/PayloadProcessor.cs:line 83
at Microsoft.Spark.Worker.TaskRunner.ProcessStream(Stream inputStream, Stream outputStream, Version version, Boolean& readComplete) in /home/lovelake/dotnet.spark/src/csharp/Microsoft.Spark.Worker/TaskRunner.cs:line 144
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:555)
at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$2.read(PythonUDFRunner.scala:86)
at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$2.read(PythonUDFRunner.scala:68)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:508)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:349)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
To Reproduce
Steps to reproduce the behavior: In a terminal type: spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master local ~/dotnet.spark/src/scala/microsoft-spark-3-2/target/microsoft-spark-3-2_2.12-2.1.0.jar /home/lovelake/dotnet.spark/artifacts/bin/Microsoft.Spark.CSharp.Examples/Debug/netcoreapp3.1/ubuntu.21.04-x64/publish/Microsoft.Spark.CSharp.Examples Sql.Batch.Basic $SPARK_HOME/examples/src/main/resources/people.json
Expected behavior Running example project.
Desktop (please complete the following information):
- OS: Lubuntu 21.04
Additional context bashrc: export SPARK_HOME=~/bin/spark-3.2.1-bin-hadoop3.2 export PATH="$SPARK_HOME/bin:$PATH" export HADOOP_HOME=~/bin/hadoop-3.2.2 export LD_LIBRARY_PATH="$HADOOP_HOME/lib/native:$LD_LIBRARY_PATH" export DOTNET_WORKER_DIR=~/dotnet.spark/artifacts/bin/Microsoft.Spark.Worker/Debug/netcoreapp3.1/ubuntu.21.04-x64/publish export DOTNET_ASSEMBLY_SEARCH_PATHS="/home/lovelake/dotnet.spark/artifacts/bin/Microsoft.Spark.CSharp.Examples/Debug/netcoreapp3.1/ubuntu.21.04-x64/publish:$DOTNET_ASSEMBLY_SEARCH_PATHS"
The first warning in the output is [Warn] [AssemblyLoader] Assembly 'Microsoft.Spark.CSharp.Examples, Version=2.1.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51' file not found 'Microsoft.Spark.CSharp.Examples.dll' in '/home/lovelake/dotnet.spark/artifacts/bin/Microsoft.Spark.CSharp.Examples/Debug/netcoreapp3.1/ubuntu.21.04-x64/publish:,/tmp/spark-be403162-ca20-4eb3-ad76-db04636fe228/userFiles-840fbcda-37be-4422-928f-26a00d495e2a,/home/lovelake,/home/lovelake/dotnet.spark/artifacts/bin/Microsoft.Spark.Worker/Debug/netcoreapp3.1/ubuntu.21.04-x64/publish/' I do not understand why symbol ':' exposed here after /publish
@aslot in your Additional context
section you listed that you ran
export DOTNET_ASSEMBLY_SEARCH_PATHS="/home/lovelake/dotnet.spark/artifacts/bin/Microsoft.Spark.CSharp.Examples/Debug/netcoreapp3.1/ubuntu.21.04-x64/publish:$DOTNET_ASSEMBLY_SEARCH_PATHS"
The :
was set there. The search paths should be separated by ,
s instead of :
s. Were you following any official guides that suggested the :
should be present? If so then we should probably get that changed.