[Bug] spark on yarn throws NoClassDefFoundError
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Search before asking
- [X] I have searched in the issues and found no similar issues.
Describe the bug
Information from @zhengchenyu
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/uniffle/shaded/javax/xml/stream/XMLStreamException
at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.uniffle.shaded.javax.xml.stream.XMLStreamException
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 11 more
Affects Version(s)
master
Uniffle Server Log Output
No response
Uniffle Engine Log Output
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/uniffle/shaded/javax/xml/stream/XMLStreamException
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:858)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:921)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.uniffle.shaded.javax.xml.stream.XMLStreamException
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 3 more
### Uniffle Server Configurations
_No response_
### Uniffle Engine Configurations
_No response_
### Additional context
_No response_
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
The Configuration of hadoop-2.8 does not import XMLStreamException, but hadoop-3.2 does. So it seems that this problem only occurs when using hadoop-3.2. When I remove the relocation for javax, job works.
@zhengchenyu Thanks for your extra information, I decompile the spark rss shaded client jar, and find that there are no javax/xml/stream/** within fat jar, but relocated by shaded plugins.
I believe this is the root cause of this issue.
When I remote the relocation for javax, job works. But I see the docker ci still fail for branch-0.9. When I revert pr 1878, docker ci works.
I am now setting up a docker ci environment on my server to see why docker ci fails
docker ci fail link: https://github.com/apache/incubator-uniffle/actions/runs/10314743478/job/28631204182
I have set up a docker ci environment, and add some debug log, found these error.
java.lang.NoClassDefFoundError: org/apache/uniffle/shaded/org/apache/commons/collections/CollectionUtils
at org.apache.uniffle.client.impl.ShuffleWriteClientImpl.genServerToBlocks(ShuffleWriteClientImpl.java:281)
at org.apache.uniffle.client.impl.ShuffleWriteClientImpl.sendShuffleData(ShuffleWriteClientImpl.java:339)
at org.apache.spark.shuffle.writer.DataPusher.lambda$send$3(DataPusher.java:96)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: org.apache.uniffle.shaded.org.apache.commons.collections.CollectionUtils
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(Unknown Source)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(Unknown Source)
at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
... 7 more
The reason is that in branch-0.9 dose not upgrade commons-collections to 4.4, so use collections3 by spark dependency. But pr 1878 relocation "org.apache" to "${rss.shade.packageName}.org.apache". collections3 is spark package does not relocation. so throw java.lang.NoClassDefFoundError.
If uniffle calls a class that is not explicitly specified in the uniffle pom file, it may be dependent on a dependent framework (spark or mr or tez). At this time, if this class is relocated, the relocated class cannot be found.
For me, At least branch-0.9 revert pr 1878 is required.