[Bug] Regardless of the client or cluster spark-mode, "spark.jar=xxx.jar" has been specified in spark conf, you cannot run "import XXXX" statement from the jar when executing spark scala core

Open FishManHome opened this issue 3 years ago • 1 comments

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Search before asking

[X] I have searched in the issues and found no similar issues.

Describe the bug

Regardless of the client or cluster spark-mode, "spark.jar=xxx.jar" has been specified in spark conf, you cannot run "import XXXX" statement from the jar when executing spark scala core.

1. My kyuubi jdbc url

private static String kyuubiJdbcUrl = "jdbc:hive2://xxxxx:9999/xxxxx;" + "?" + "kyuubi.engine.pool.name=USER-GROUP;" + "kyuubi.engine.pool.size=1;" + "kyuubi.session.engine.idle.timeout=PT15M;" + "#" + "spark.jars=s3://xxxx/*.jar;" + "spark.hadoop.mapreduce.input.pathFilter.class=org.apache.hudi.hadoop.HoodieROTablePathFilter;" + "spark.cassandra.connection.host=xxx;" + "spark.cassandra.connection.port=xxx;" + "spark.cassandra.auth.username=xxx;" + "spark.cassandra.auth.password=xxx;";

2. My Spark code api

import org.apache.spark.sql.{SaveMode, SparkSession}
import com.datastax.spark.connector.cql.CassandraConnectorConf import org.apache.spark.sql.cassandra._ spark.setCassandraConf(CassandraConnectorConf.KeepAliveMillisParam.option(10000)) val writeDf = spark.read.parquet("xxxx") writeDf.printSchema() val cassandraMap = Map(table -> "xxxx", keyspace -> " xxxx") writeDf.write.format("org.apache.spark.sql.cassandra").options(cassandraMap).mode(SaveMode.Append).save()

3. Error

Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: Error operating EXECUTE_STATEMENT: org.apache.kyuubi.KyuubiSQLException: Interpret error: import org.apache.spark.sql.{SaveMode, SparkSession} import com.datastax.spark.connector.cql.CassandraConnectorConf import org.apache.spark.sql.cassandra._ spark.setCassandraConf(CassandraConnectorConf.KeepAliveMillisParam.option(10000)) val writeDf = spark.read.parquet("s3:xxxx") writeDf.printSchema() val cassandraMap = Map("table" -> "xxxx", "keyspace" -> "xxxx") writeDf.write.format("org.apache.spark.sql.cassandra").options(cassandraMap).mode(SaveMode.Append).save()

console:24: error: object datastax is not a member of package com import com.datastax.spark.connector.cql.CassandraConnectorConf ^ console:25: error: object cassandra is not a member of package org.apache.spark.sql import org.apache.spark.sql.cassandra._ ^

4. Problem clue

We can find that it is possible to execute ’import org.apache.spark.sql.{savemode, sparksession}‘, but it is abnormal to execute 'import com.datastex.spark.connector.cql.cassandraconnectorconf'.

In spark application UI, we found xxx.Jar has been introduced into ’spark.yarn.dist.jars' and 'spark.yarn.secondary.jars‘, but cannot run "import XXXX" statement.

We execute ’spark shell --jars xxx.jar', this code can be executed normally. spark-shell --name my1 --conf "spark.cassandra.connection.host=xxxx" --conf "spark.cassandra.connection.port=9042" --conf "spark.cassandra.auth.username=xxxx" --conf "spark.cassandra.auth.password=xxx" --conf "spark.jars=s3:/xxxxx/*.jar"

Affects Version(s)

1.5.1

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

spark.submit.deployMode=cluster/client

Additional context

No response

Are you willing to submit PR?

[ ] Yes I am willing to submit a PR!

Jun 20 '22 03:06 FishManHome

This is not a bug in Kyuubi. Using "spark.jar=xxx.jar" which does not add the jar path to spark.sharedState.jarClassLoader, you need to use the extraClassPath conf to add the corresponding classpath. Another way to load jars dynamically is to use spark.sq("add jar /path/to/test.jar"), see issue #2471

Jun 20 '22 08:06 iodone