Spark loader has dependency conflicts
Bug Type (问题类型)
None
Before submit
- [X] I had searched in the issues and found no similar issues.
Environment (环境信息)
- Server Version: v1.0.0
- Toolchain Version: v1.0.0
Expected & Actual behavior (期望与实际表现)
When running spark-loader, there were a lot of dependency conflicts. Finally, I found the jersey,jakarta,hk2 related package, which was in spark's jars directory, and it was not the same as the jersey,jakarta,hk2 version in the lib directory after the loader was packaged. For example, lib has mainly 3.x jersey and spark has 2.x jars. My solution was to remove the conflicting jars from the spark jars directory and use the loader as it was, so the loader could run. Screenshot of error message:
The simple removal of the jar worked for me but wasn't general enough. I'm trying to use spark-submit --exclude-jars and see if the community has a better solution, thanks
Vertex/Edge example (问题点 / 边数据举例)
No response
Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)
No response
seems our spark loader meets some exception now #404, consider fix/ensure them as a tiny task? @simon824
Solve this issue by shade?
what do you think? @haohao0103 @JackyYangPassion @imbajin @liuxiaocs7
Solve this issue by
shade? what do you think? @haohao0103 @JackyYangPassion @imbajin @liuxiaocs7
thanks, I tried to specify that userclasspath has higher priority but I was blocked by many other issues. Use shade plugin relocation to solve this problem, right? There are a few conflicting jars, but it's a good idea I can try
Solve this issue by
shade? what do you think? @haohao0103 @JackyYangPassion @imbajin @liuxiaocs7thanks, I tried to specify that userclasspath has higher priority but I was blocked by many other issues. Use shade plugin relocation to solve this problem, right? There are a few conflicting jars, but it's a good idea I can try
address this issue, any update with it?
@imbajin hello, I am following up on this matter. The local test basically passed. I will submit a pr as soon as possible.thanks
Hi, @haohao0103, may I ask if you have solved this problem, there are many dependency conflicts in hugegraph-common and Spark, mainly jakarta and javax version conflicts, they cannot be imported at the same time and how to run in IDEA?
When follow dep in loader:
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.2.2</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<artifactId>jersey-client</artifactId>
<groupId>org.glassfish.jersey.core</groupId>
</exclusion>
<exclusion>
<groupId>org.glassfish.jersey.media</groupId>
<artifactId>jersey-media-json-jackson</artifactId>
</exclusion>
<exclusion>
<artifactId>jersey-common</artifactId>
<groupId>org.glassfish.jersey.core</groupId>
</exclusion>
<exclusion>
<artifactId>jersey-container-servlet</artifactId>
<groupId>org.glassfish.jersey.containers</groupId>
</exclusion>
<exclusion>
<artifactId>jersey-container-servlet-core</artifactId>
<groupId>org.glassfish.jersey.containers</groupId>
</exclusion>
<exclusion>
<artifactId>jersey-hk2</artifactId>
<groupId>org.glassfish.jersey.inject</groupId>
</exclusion>
<exclusion>
<artifactId>jersey-server</artifactId>
<groupId>org.glassfish.jersey.core</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.2.2</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<artifactId>antlr4-runtime</artifactId>
<groupId>org.antlr</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hugegraph</groupId>
<artifactId>hugegraph-client</artifactId>
<version>1.0.0</version>
<exclusions>
<!-- Note: jackson version should < 2.13 with scala 2.12 -->
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.jaxrs</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
logs:
23/07/26 12:05:49 INFO SparkEnv: Registering OutputCommitCoordinator
Exception in thread "main" java.lang.NoClassDefFoundError: jakarta/servlet/Filter
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.spark.status.api.v1.ApiRootResource$.getServletHandler(ApiRootResource.scala:63)
at org.apache.spark.ui.SparkUI.initialize(SparkUI.scala:68)
at org.apache.spark.ui.SparkUI.<init>(SparkUI.scala:81)
at org.apache.spark.ui.SparkUI$.create(SparkUI.scala:183)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:480)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at org.apache.hugegraph.spark.HelloWorld1.main(HelloWorld1.java:17)
Caused by: java.lang.ClassNotFoundException: jakarta.servlet.Filter
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 19 more
23/07/26 12:05:49 INFO DiskBlockManager: Shutdown hook called
23/07/26 12:05:49 INFO ShutdownHookManager: Shutdown hook called
in jersey 3.0.3 used in common, use jakarta but in spark ui use javax, jersey 2.x, namespace conflicts.
@liuxiaocs7
hello, the conflict I'm trying to resolve is exactly what you described !!!
after the shaded jar is successfully packaged, you can run bin/hugegraph-spark-loader.sh to test it.
as for how to run in IDE?
I understand that since we set spark dependency to provided, we can't run in the IDE. Before shade, we can run temporary tests in IDE by modifying the scope that spark depends on, but now I think it is not possible.If we change the scope of spark dependency, we need to adjust the shade strategy to match whether spark dependency is external or existing in the project。
@haohao0103 @liuxiaocs7
- can we refer to how the Spark community deals with the issue of javax and Jakarta package names ?
- in this case, can exclude jakarta.servlet-api in spark-core_2.12, and add manual? because jakarta.servlet-api-4.0.x use javax but change to jakarta from version 5.0.0 , so we can upgrade to 5.0.0, see :
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.2.2</version>
<exclusions>
<exclusion>
<groupId>jakarta.servlet</groupId>
<artifactId>jakarta.servlet-api</artifactId>
</exclusion>
......
</exclusions>
</dependency>
<dependency>
<groupId>jakarta.servlet</groupId>
<artifactId>jakarta.servlet-api</artifactId>
<version>5.0.0</version>
</dependency>
Hi, @z7658329, I have tried to manually specify the version of jakarta.servlet-api as 5.0.0 or 6.0.0 instead of the default 4.0.3, and got the following results
Exception in thread "main" java.lang.NoClassDefFoundError: javax/servlet/Servlet
at org.apache.spark.ui.SparkUI$.create(SparkUI.scala:183)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:480)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at top.liuxiaocs.Main.main(Main.java:16)
Caused by: java.lang.ClassNotFoundException: javax.servlet.Servlet
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 4 more
pom:
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.2.2</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>jakarta.servlet</groupId>
<artifactId>jakarta.servlet-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.2.2</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<artifactId>antlr4-runtime</artifactId>
<groupId>org.antlr</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hugegraph</groupId>
<artifactId>hugegraph-client</artifactId>
<version>1.0.0</version>
<exclusions>
<!-- Note: jackson version should < 2.13 with scala 2.12 -->
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.jaxrs</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>jakarta.servlet</groupId>
<artifactId>jakarta.servlet-api</artifactId>
<version>5.0.0</version>
</dependency>
</dependencies>
@liuxiaocs7 hello, the conflict I'm trying to resolve is exactly what you described !!! after the shaded jar is successfully packaged, you can run bin/hugegraph-spark-loader.sh to test it. as for how to run in IDE? I understand that since we set spark dependency to provided, we can't run in the IDE. Before shade, we can run temporary tests in IDE by modifying the scope that spark depends on, but now I think it is not possible.If we change the scope of spark dependency, we need to adjust the shade strategy to match whether spark dependency is external or existing in the project。
Thank you for your detailed explanation. I will try it based on your PR. As for the dependency whose scope is provide, you can check this option in IDEA.
but now I think it is not possible.If we change the scope of spark dependency, we need to adjust the shade strategy to match whether spark dependency is external or existing in the project
Yeap, they cannot coexist in IDEA
A minimal runnable Spark3.2.2+HugeGraph-Client1.0.0 example: https://github.com/liuxiaocs7/HugeGraphSpark
@liuxiaocs7 Thank you very much. I didn't know it was possible to do this without modifying the pom file.