[Bug] Regardless of the client or cluster spark-mode, "spark.jar=xxx.jar" has been specified in spark conf, you cannot run "import XXXX" statement from the jar when executing spark scala core
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Search before asking
- [X] I have searched in the issues and found no similar issues.
Describe the bug
Regardless of the client or cluster spark-mode, "spark.jar=xxx.jar" has been specified in spark conf, you cannot run "import XXXX" statement from the jar when executing spark scala core.
1. My kyuubi jdbc url
private static String kyuubiJdbcUrl = "jdbc:hive2://xxxxx:9999/xxxxx;" + "?" + "kyuubi.engine.pool.name=USER-GROUP;" + "kyuubi.engine.pool.size=1;" + "kyuubi.session.engine.idle.timeout=PT15M;" + "#" + "spark.jars=s3://xxxx/*.jar;" + "spark.hadoop.mapreduce.input.pathFilter.class=org.apache.hudi.hadoop.HoodieROTablePathFilter;" + "spark.cassandra.connection.host=xxx;" + "spark.cassandra.connection.port=xxx;" + "spark.cassandra.auth.username=xxx;" + "spark.cassandra.auth.password=xxx;";
2. My Spark code api
import org.apache.spark.sql.{SaveMode, SparkSession}
import com.datastax.spark.connector.cql.CassandraConnectorConf import org.apache.spark.sql.cassandra._ spark.setCassandraConf(CassandraConnectorConf.KeepAliveMillisParam.option(10000)) val writeDf = spark.read.parquet("xxxx") writeDf.printSchema() val cassandraMap = Map(table -> "xxxx", keyspace -> " xxxx") writeDf.write.format("org.apache.spark.sql.cassandra").options(cassandraMap).mode(SaveMode.Append).save()
3. Error
Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: Error operating EXECUTE_STATEMENT: org.apache.kyuubi.KyuubiSQLException: Interpret error: import org.apache.spark.sql.{SaveMode, SparkSession} import com.datastax.spark.connector.cql.CassandraConnectorConf import org.apache.spark.sql.cassandra._ spark.setCassandraConf(CassandraConnectorConf.KeepAliveMillisParam.option(10000)) val writeDf = spark.read.parquet("s3:xxxx") writeDf.printSchema() val cassandraMap = Map("table" -> "xxxx", "keyspace" -> "xxxx") writeDf.write.format("org.apache.spark.sql.cassandra").options(cassandraMap).mode(SaveMode.Append).save()
console:24: error: object datastax is not a member of package com import com.datastax.spark.connector.cql.CassandraConnectorConf ^ console:25: error: object cassandra is not a member of package org.apache.spark.sql import org.apache.spark.sql.cassandra._ ^
4. Problem clue
We can find that it is possible to execute ’import org.apache.spark.sql.{savemode, sparksession}‘, but it is abnormal to execute 'import com.datastex.spark.connector.cql.cassandraconnectorconf'.
In spark application UI, we found xxx.Jar has been introduced into ’spark.yarn.dist.jars' and 'spark.yarn.secondary.jars‘, but cannot run "import XXXX" statement.
We execute ’spark shell --jars xxx.jar', this code can be executed normally.
spark-shell --name my1 --conf "spark.cassandra.connection.host=xxxx" --conf "spark.cassandra.connection.port=9042" --conf "spark.cassandra.auth.username=xxxx" --conf "spark.cassandra.auth.password=xxx" --conf "spark.jars=s3:/xxxxx/*.jar"
Affects Version(s)
1.5.1
Kyuubi Server Log Output
No response
Kyuubi Engine Log Output
No response
Kyuubi Server Configurations
No response
Kyuubi Engine Configurations
spark.submit.deployMode=cluster/client
Additional context
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
This is not a bug in Kyuubi. Using "spark.jar=xxx.jar" which does not add the jar path to spark.sharedState.jarClassLoader, you need to use the extraClassPath conf to add the corresponding classpath.
Another way to load jars dynamically is to use spark.sq("add jar /path/to/test.jar"), see issue #2471