datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

java.lang.ClassNotFoundException: org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager

Open radhikabajaj123 opened this issue 1 year ago • 7 comments

Hello,

I am getting the following exception when running spark-submit:

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1780) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:67) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:429) at org.apache.spark.executor.YarnCoarseGrainedExecutorBackend$.main(YarnCoarseGrainedExecutorBackend.scala:83) at org.apache.spark.executor.YarnCoarseGrainedExecutorBackend.main(YarnCoarseGrainedExecutorBackend.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:467) at org.apache.spark.util.Utils$.classForName(Utils.scala:232) at org.apache.spark.util.Utils$.instantiateSerializerOrShuffleManager(Utils.scala:2770) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:433) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:320) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:478) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67) at java.base/java.security.AccessController.doPrivileged(AccessController.java:712) at java.base/javax.security.auth.Subject.doAs(Subject.java:439) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) ... 4 more

These are the configurations I am using for spark-submit:

--deploy-mode cluster \
--driver-memory 32g \
--executor-memory 128g \
--executor-cores 18 \
--driver-cores 8 \
--num-executors 3 \
--conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \
--conf spark.yarn.populateHadoopClasspath=false \
--conf spark.yarn.archive=$BENCH_HOME/$BENCH_DISTR.tgz \
--jars /root/datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-sources.jar,./datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-test-sources.jar,./datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar,./datafusion-comet/spark/target/original-comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-sources.jar,./datafusion-comet/spark/target/original-comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar \
--conf spark.driver.extraClassPath=/root/datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-sources.jar,./datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-test-sources.jar,./datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar,./datafusion-comet/spark/target/original-comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-sources.jar,./datafusion-comet/spark/target/original-comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar \
--conf spark.executor.extraClassPath=/root/datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-sources.jar,./datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-test-sources.jar,./datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar,./datafusion-comet/spark/target/original-comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-sources.jar,./datafusion-comet/spark/target/original-comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar \
--conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \
--conf spark.comet.enabled=true \
--conf spark.comet.exec.enabled=true \
--conf spark.comet.exec.all.enabled=true \
--conf spark.comet.explainFallback.enabled=true \
--conf spark.comet.cast.allowIncompatible=true \
--conf spark.comet.exec.shuffle.enabled=true \
--conf spark.comet.exec.shuffle.mode=auto \
--conf spark.comet.shuffle.enforceMode.enabled=true \
--conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \

Does anyone have any insights as to what might be causing the error?

radhikabajaj123 avatar Aug 22 '24 18:08 radhikabajaj123

I see that you are submitting multiple jars. One is using an absolute path under /root and others are using a relative path, which seems like it is maybe not intended?

Also, there is no need to submit the source jars or test source jars.

Could you try submitting just comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar using an absolute path?

/root/datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-sources.jar,
./datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-test-sources.jar,
./datafusion-comet/spark/target/comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar,
./datafusion-comet/spark/target/original-comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT-sources.jar,
./datafusion-comet/spark/target/original-comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar \

andygrove avatar Aug 23 '24 04:08 andygrove

I had tried submitting just comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar using an absolute path and that had also gave the same error.

radhikabajaj123 avatar Aug 23 '24 04:08 radhikabajaj123

spark.[driver|executor].extraClassPath should be a semicolon-separated list of local jars with absolute local paths. spark-submit silently ignores errors in this config. That's why spark cannot find mentioned class in its classpath. This example works for me

export JARS_LOCAL="/opt/spark-3.5.1/jars_ext/comet-spark-spark3.5_2.12-0.2.0-SNAPSHOT-210824.jar:/opt/spark-3.5.1/jars_ext/spark-metrics-3.5-1.0.0.jar";
spark-shell \
...
--conf spark.plugins=org.apache.spark.CometPlugin \
--conf spark.driver.extraClassPath=$JARS_LOCAL \
--conf spark.executor.extraClassPath=$JARS_LOCAL
...

nblagodarnyi avatar Aug 28 '24 15:08 nblagodarnyi

Hi Nikita, thanks for the reply!

I am receiving the same error when I try submitting a single jar comet-spark-spark3.4_2.13-0.2.0-SNAPSHOT.jar using an absolute local path.

radhikabajaj123 avatar Aug 28 '24 16:08 radhikabajaj123

It doesn't make sense. I also don't think this is related to Comet. Based on what you described, seems you cannot include any third-party classes through --jars config.

Are you able to have any jar other than Comet in --jars and import any class from it?

viirya avatar Aug 28 '24 17:08 viirya

@radhikabajaj123 note that this local jar (with local path) should be present on all worker nodes of your cluster.

nblagodarnyi avatar Aug 29 '24 10:08 nblagodarnyi

@radhikabajaj123

spark.[driver|executor].extraClassPath - is a part which will be added as classpath parameters so have restriction:

  1. have to use paths to libs presented on machines
  2. have to use OS specific delimiters for classpath, on linux is a :

every time when you use spark-submit all libraries from --jars will be loaded to local working dir, so you don't need to provide relative path, possible options:

  1. do file file distribution on cluster and use this paths later with spark.[driver|executor].extraClassPath + absolute local path
  2. remove any spark.[driver|executor].extraClassPath from you spark-submit and include all necessary jars into --jars parameter

for test purpose i recommend to use option 2 but for production better to use option 1, because you can rely on yarn-site.xml config and include this jars into classpath by default

xhumanoid avatar Aug 29 '24 11:08 xhumanoid

I get similar error when invoking with spark-shell. Tried below:

a) Uploaded the jar to hdfs: hdfs:///home/hadoop/libraries/comet-spark-spark3.5_2.12-0.4.0.jar and then executed below spark-shell command: spark-shell --jars hdfs:///home/hadoop/libraries/comet-spark-spark3.5_2.12-0.4.0.jar

b) Ran above command and verified class is loaded in driver, using below code:

val clazz = Class.forName("org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager") println(clazz)

c) ) Verified class is loaded in executor using below code:

val rdd = sc.parallelize(Seq(1), 1) rdd.map { _ => val clazz = Class.forName("org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager") s"Executor loaded class: $clazz" }.collect().foreach(println)

d) When invoked with --conf spark.shuffle.manager, keep getting class not found exception for CometShuffleManager

spark-shell --jars hdfs:///home/hadoop/libraries/comet-spark-spark3.5_2.12-0.4.0.jar --conf spark.driver.extraClassPath=hdfs:///home/hadoop/libraries/comet-spark-spark3.5_2.12-0.4.0.jar --conf spark.executor.extraClassPath=hdfs:///home/hadoop/libraries/comet-spark-spark3.5_2.12-0.4.0.jar --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager

ramyadass avatar Dec 31 '24 05:12 ramyadass

Any thoughts on above failure ?

ramyadass avatar Jan 03 '25 18:01 ramyadass

@ramyadass please carefully read all the comments above. extraClassPath should be a local path, not hdfs.

nblagodarnyi avatar Jan 04 '25 11:01 nblagodarnyi

@nblagodarnyi ,

I've tried to use Comet with Spark using two different commands, but I'm encountering the same error in both cases.

  1. Using local jar: spark-shell --jars /home/hadoop/.ivy2/jars/comet-spark-spark3.5_2.12-0.4.0.jar --conf spark.driver.extraClassPath=/home/hadoop/.ivy2/jars/comet-spark-spark3.5_2.12-0.4.0.jar --conf spark.executor.extraClassPath=/home/hadoop/.ivy2/jars/comet-spark-spark3.5_2.12-0.4.0.jar --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager

  2. Using HDFS jar: spark-shell --jars hdfs:///home/hadoop/libraries/comet-spark-spark3.5_2.12-0.4.0.jar --conf spark.driver.extraClassPath=/home/hadoop/.ivy2/jars/comet-spark-spark3.5_2.12-0.4.0.jar --conf spark.executor.extraClassPath=/home/hadoop/.ivy2/jars/comet-spark-spark3.5_2.12-0.4.0.jar --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager

Error received in both cases: Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager

I've verified that the jar file contains the specified class.

  1. Are there any additional steps or pre-configurations I might be missing?
  2. Do you have any suggestions for resolving this ClassNotFoundException?

Any insights would be greatly appreciated. Thank you!

ramyadass avatar Jan 09 '25 01:01 ramyadass

Does this local path/file /home/hadoop/.ivy2/jars/comet-spark-spark3.5_2.12-0.4.0.jar exist on all affected workers (cluster machines where spark drivers/executors can be run on)?

nblagodarnyi avatar Jan 09 '25 15:01 nblagodarnyi

For anyone wondering (seems like HDFS issue?): Set the full path to the jar (with hdfs://) for spark.jars param and just the name of jar for both spark.driver.extraClassPath and spark.executor.extraClassPath params. It will look like:

    .set("spark.jars", "hdfs://my-full-path/comet-spark-spark3.5_2.12-0.6.0.jar")
    .set("spark.driver.extraClassPath", "comet-spark-spark3.5_2.12-0.6.0.jar")
    .set("spark.executor.extraClassPath", "comet-spark-spark3.5_2.12-0.6.0.jar")

At least that worked for me. Though maybe this depends on your cluster configuration...

Iskander14yo avatar Jun 28 '25 20:06 Iskander14yo