incubator-hugegraph-toolchain icon indicating copy to clipboard operation
incubator-hugegraph-toolchain copied to clipboard

Spark loader has dependency conflicts

Open haohao0103 opened this issue 2 years ago • 12 comments

Bug Type (问题类型)

None

Before submit

  • [X] I had searched in the issues and found no similar issues.

Environment (环境信息)

  • Server Version: v1.0.0
  • Toolchain Version: v1.0.0

Expected & Actual behavior (期望与实际表现)

When running spark-loader, there were a lot of dependency conflicts. Finally, I found the jersey,jakarta,hk2 related package, which was in spark's jars directory, and it was not the same as the jersey,jakarta,hk2 version in the lib directory after the loader was packaged. For example, lib has mainly 3.x jersey and spark has 2.x jars. My solution was to remove the conflicting jars from the spark jars directory and use the loader as it was, so the loader could run. Screenshot of error message: image image

The simple removal of the jar worked for me but wasn't general enough. I'm trying to use spark-submit --exclude-jars and see if the community has a better solution, thanks

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

haohao0103 avatar May 17 '23 10:05 haohao0103

seems our spark loader meets some exception now #404, consider fix/ensure them as a tiny task? @simon824

imbajin avatar May 17 '23 15:05 imbajin

Solve this issue by shade? what do you think? @haohao0103 @JackyYangPassion @imbajin @liuxiaocs7

simon824 avatar May 23 '23 01:05 simon824

Solve this issue by shade? what do you think? @haohao0103 @JackyYangPassion @imbajin @liuxiaocs7

thanks, I tried to specify that userclasspath has higher priority but I was blocked by many other issues. Use shade plugin relocation to solve this problem, right? There are a few conflicting jars, but it's a good idea I can try

haohao0103 avatar May 23 '23 03:05 haohao0103

Solve this issue by shade? what do you think? @haohao0103 @JackyYangPassion @imbajin @liuxiaocs7

thanks, I tried to specify that userclasspath has higher priority but I was blocked by many other issues. Use shade plugin relocation to solve this problem, right? There are a few conflicting jars, but it's a good idea I can try

address this issue, any update with it?

imbajin avatar Jun 09 '23 04:06 imbajin

@imbajin hello, I am following up on this matter. The local test basically passed. I will submit a pr as soon as possible.thanks

haohao0103 avatar Jun 12 '23 06:06 haohao0103

Hi, @haohao0103, may I ask if you have solved this problem, there are many dependency conflicts in hugegraph-common and Spark, mainly jakarta and javax version conflicts, they cannot be imported at the same time and how to run in IDEA?

When follow dep in loader:

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.12</artifactId>
        <version>3.2.2</version>
        <scope>provided</scope>
        <exclusions>
            <exclusion>
                <artifactId>jersey-client</artifactId>
                <groupId>org.glassfish.jersey.core</groupId>
            </exclusion>
            <exclusion>
                <groupId>org.glassfish.jersey.media</groupId>
                <artifactId>jersey-media-json-jackson</artifactId>
            </exclusion>
            <exclusion>
                <artifactId>jersey-common</artifactId>
                <groupId>org.glassfish.jersey.core</groupId>
            </exclusion>
            <exclusion>
                <artifactId>jersey-container-servlet</artifactId>
                <groupId>org.glassfish.jersey.containers</groupId>
            </exclusion>
            <exclusion>
                <artifactId>jersey-container-servlet-core</artifactId>
                <groupId>org.glassfish.jersey.containers</groupId>
            </exclusion>
            <exclusion>
                <artifactId>jersey-hk2</artifactId>
                <groupId>org.glassfish.jersey.inject</groupId>
            </exclusion>
            <exclusion>
                <artifactId>jersey-server</artifactId>
                <groupId>org.glassfish.jersey.core</groupId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.12</artifactId>
        <version>3.2.2</version>
        <scope>provided</scope>
        <exclusions>
            <exclusion>
                <artifactId>antlr4-runtime</artifactId>
                <groupId>org.antlr</groupId>
            </exclusion>
        </exclusions>
    </dependency>

    <dependency>
        <groupId>org.apache.hugegraph</groupId>
        <artifactId>hugegraph-client</artifactId>
        <version>1.0.0</version>
        <exclusions>
            <!-- Note: jackson version should < 2.13 with scala 2.12 -->
            <exclusion>
                <groupId>com.fasterxml.jackson.core</groupId>
                <artifactId>*</artifactId>
            </exclusion>
            <exclusion>
                <groupId>com.fasterxml.jackson.module</groupId>
                <artifactId>*</artifactId>
            </exclusion>
            <exclusion>
                <groupId>com.fasterxml.jackson.jaxrs</groupId>
                <artifactId>*</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
</dependencies>

image

logs:

23/07/26 12:05:49 INFO SparkEnv: Registering OutputCommitCoordinator
Exception in thread "main" java.lang.NoClassDefFoundError: jakarta/servlet/Filter
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	at org.apache.spark.status.api.v1.ApiRootResource$.getServletHandler(ApiRootResource.scala:63)
	at org.apache.spark.ui.SparkUI.initialize(SparkUI.scala:68)
	at org.apache.spark.ui.SparkUI.<init>(SparkUI.scala:81)
	at org.apache.spark.ui.SparkUI$.create(SparkUI.scala:183)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:480)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at org.apache.hugegraph.spark.HelloWorld1.main(HelloWorld1.java:17)
Caused by: java.lang.ClassNotFoundException: jakarta.servlet.Filter
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 19 more
23/07/26 12:05:49 INFO DiskBlockManager: Shutdown hook called
23/07/26 12:05:49 INFO ShutdownHookManager: Shutdown hook called

in jersey 3.0.3 used in common, use jakarta but in spark ui use javax, jersey 2.x, namespace conflicts.

liuxiaocs7 avatar Jul 25 '23 13:07 liuxiaocs7

@liuxiaocs7 hello, the conflict I'm trying to resolve is exactly what you described !!! after the shaded jar is successfully packaged, you can run bin/hugegraph-spark-loader.sh to test it. as for how to run in IDE?
I understand that since we set spark dependency to provided, we can't run in the IDE. Before shade, we can run temporary tests in IDE by modifying the scope that spark depends on, but now I think it is not possible.If we change the scope of spark dependency, we need to adjust the shade strategy to match whether spark dependency is external or existing in the project。

haohao0103 avatar Jul 27 '23 07:07 haohao0103

@haohao0103 @liuxiaocs7

  1. can we refer to how the Spark community deals with the issue of javax and Jakarta package names ?
  2. in this case, can exclude jakarta.servlet-api in spark-core_2.12, and add manual? because jakarta.servlet-api-4.0.x use javax but change to jakarta from version 5.0.0 , so we can upgrade to 5.0.0, see :
         <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.12</artifactId>
            <version>3.2.2</version>
            <exclusions>
                <exclusion>
                    <groupId>jakarta.servlet</groupId>
                    <artifactId>jakarta.servlet-api</artifactId>
                </exclusion>
                
                ......
                
            </exclusions>
        </dependency>


        <dependency>
            <groupId>jakarta.servlet</groupId>
            <artifactId>jakarta.servlet-api</artifactId>
            <version>5.0.0</version>
        </dependency>

z7658329 avatar Jul 28 '23 05:07 z7658329

Hi, @z7658329, I have tried to manually specify the version of jakarta.servlet-api as 5.0.0 or 6.0.0 instead of the default 4.0.3, and got the following results

Exception in thread "main" java.lang.NoClassDefFoundError: javax/servlet/Servlet
	at org.apache.spark.ui.SparkUI$.create(SparkUI.scala:183)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:480)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at top.liuxiaocs.Main.main(Main.java:16)
Caused by: java.lang.ClassNotFoundException: javax.servlet.Servlet
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 4 more

pom:

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.12</artifactId>
        <version>3.2.2</version>
        <scope>provided</scope>
        <exclusions>
            <exclusion>
                <groupId>jakarta.servlet</groupId>
                <artifactId>jakarta.servlet-api</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.12</artifactId>
        <version>3.2.2</version>
        <scope>provided</scope>
        <exclusions>
            <exclusion>
                <artifactId>antlr4-runtime</artifactId>
                <groupId>org.antlr</groupId>
            </exclusion>
        </exclusions>
    </dependency>

    <dependency>
        <groupId>org.apache.hugegraph</groupId>
        <artifactId>hugegraph-client</artifactId>
        <version>1.0.0</version>
        <exclusions>
            <!-- Note: jackson version should < 2.13 with scala 2.12 -->
            <exclusion>
                <groupId>com.fasterxml.jackson.core</groupId>
                <artifactId>*</artifactId>
            </exclusion>
            <exclusion>
                <groupId>com.fasterxml.jackson.module</groupId>
                <artifactId>*</artifactId>
            </exclusion>
            <exclusion>
                <groupId>com.fasterxml.jackson.jaxrs</groupId>
                <artifactId>*</artifactId>
            </exclusion>
        </exclusions>
    </dependency>

    <dependency>
        <groupId>jakarta.servlet</groupId>
        <artifactId>jakarta.servlet-api</artifactId>
        <version>5.0.0</version>
    </dependency>
</dependencies>

liuxiaocs7 avatar Jul 28 '23 07:07 liuxiaocs7

@liuxiaocs7 hello, the conflict I'm trying to resolve is exactly what you described !!! after the shaded jar is successfully packaged, you can run bin/hugegraph-spark-loader.sh to test it. as for how to run in IDE? I understand that since we set spark dependency to provided, we can't run in the IDE. Before shade, we can run temporary tests in IDE by modifying the scope that spark depends on, but now I think it is not possible.If we change the scope of spark dependency, we need to adjust the shade strategy to match whether spark dependency is external or existing in the project。

Thank you for your detailed explanation. I will try it based on your PR. As for the dependency whose scope is provide, you can check this option in IDEA.

image

but now I think it is not possible.If we change the scope of spark dependency, we need to adjust the shade strategy to match whether spark dependency is external or existing in the project

Yeap, they cannot coexist in IDEA

liuxiaocs7 avatar Jul 28 '23 07:07 liuxiaocs7

A minimal runnable Spark3.2.2+HugeGraph-Client1.0.0 example: https://github.com/liuxiaocs7/HugeGraphSpark

liuxiaocs7 avatar Jul 29 '23 05:07 liuxiaocs7

@liuxiaocs7 Thank you very much. I didn't know it was possible to do this without modifying the pom file.

haohao0103 avatar Jul 31 '23 01:07 haohao0103