incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[Bug] spark on yarn throws NoClassDefFoundError

Open maobaolong opened this issue 1 year ago • 4 comments

Code of Conduct

Search before asking

  • [X] I have searched in the issues and found no similar issues.

Describe the bug

Information from @zhengchenyu

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/uniffle/shaded/javax/xml/stream/XMLStreamException
	at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426)
	at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.uniffle.shaded.javax.xml.stream.XMLStreamException
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 11 more

Affects Version(s)

master

Uniffle Server Log Output

No response

Uniffle Engine Log Output

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/uniffle/shaded/javax/xml/stream/XMLStreamException
  at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:858)
  at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:921)
  at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.uniffle.shaded.javax.xml.stream.XMLStreamException
  at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
  ... 3 more


### Uniffle Server Configurations

_No response_

### Uniffle Engine Configurations

_No response_

### Additional context

_No response_

### Are you willing to submit PR?

- [X] Yes I am willing to submit a PR!

maobaolong avatar Aug 10 '24 00:08 maobaolong

The Configuration of hadoop-2.8 does not import XMLStreamException, but hadoop-3.2 does. So it seems that this problem only occurs when using hadoop-3.2. When I remove the relocation for javax, job works.

zhengchenyu avatar Aug 10 '24 01:08 zhengchenyu

@zhengchenyu Thanks for your extra information, I decompile the spark rss shaded client jar, and find that there are no javax/xml/stream/** within fat jar, but relocated by shaded plugins.

image

I believe this is the root cause of this issue.

maobaolong avatar Aug 10 '24 02:08 maobaolong

When I remote the relocation for javax, job works. But I see the docker ci still fail for branch-0.9. When I revert pr 1878, docker ci works.

I am now setting up a docker ci environment on my server to see why docker ci fails

docker ci fail link: https://github.com/apache/incubator-uniffle/actions/runs/10314743478/job/28631204182

zhengchenyu avatar Aug 12 '24 06:08 zhengchenyu

I have set up a docker ci environment, and add some debug log, found these error.

java.lang.NoClassDefFoundError: org/apache/uniffle/shaded/org/apache/commons/collections/CollectionUtils
 at org.apache.uniffle.client.impl.ShuffleWriteClientImpl.genServerToBlocks(ShuffleWriteClientImpl.java:281)
 at org.apache.uniffle.client.impl.ShuffleWriteClientImpl.sendShuffleData(ShuffleWriteClientImpl.java:339)
 at org.apache.spark.shuffle.writer.DataPusher.lambda$send$3(DataPusher.java:96)
 at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: org.apache.uniffle.shaded.org.apache.commons.collections.CollectionUtils
 at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(Unknown Source)
 at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(Unknown Source)
 at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
 ... 7 more

The reason is that in branch-0.9 dose not upgrade commons-collections to 4.4, so use collections3 by spark dependency. But pr 1878 relocation "org.apache" to "${rss.shade.packageName}.org.apache". collections3 is spark package does not relocation. so throw java.lang.NoClassDefFoundError.

If uniffle calls a class that is not explicitly specified in the uniffle pom file, it may be dependent on a dependent framework (spark or mr or tez). At this time, if this class is relocated, the relocated class cannot be found.

For me, At least branch-0.9 revert pr 1878 is required.

zhengchenyu avatar Aug 12 '24 08:08 zhengchenyu