paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Bug] The inclusion of high-version Spark classes in paimon-spark-common may lead to certain exceptions.

Open thomasg19930417 opened this issue 11 months ago • 6 comments

Search before asking

  • [x] I searched in the issues and found nothing similar.

Paimon version

1.0.0

Compute Engine

spark3.3

Minimal reproduce step

execute desc tableName or show create table command

What doesn't meet your expectations?

scala> spark.sql("show create table tableName").show(false) java.lang.NoClassDefFoundError: [Lorg/apache/spark/sql/connector/catalog/Column; at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784) at org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils$.invoke(AuthZUtils.scala:63) at org.apache.kyuubi.plugin.spark.authz.util.AuthZUtils$.invokeAs(AuthZUtils.scala:77) at org.apache.kyuubi.plugin.spark.authz.serde.TableExtractor$.getOwner(tableExtractors.scala:50) at org.apache.kyuubi.plugin.spark.authz.serde.ResolvedTableTableExtractor.apply(tableExtractors.scala:103) at org.apache.kyuubi.plugin.spark.authz.serde.ResolvedTableTableExtractor.apply(tableExtractors.scala:97) at org.apache.kyuubi.plugin.spark.authz.serde.TableDesc.extract(Descriptor.scala:244) at org.apache.kyuubi.plugin.spark.authz.PrivilegesBuilder$.getTablePriv$1(PrivilegesBuilder.scala:128) at org.apache.kyuubi.plugin.spark.authz.PrivilegesBuilder$.$anonfun$buildCommand$7(PrivilegesBuilder.scala:174) at scala.collection.immutable.List.foreach(List.scala:431) at org.apache.kyuubi.plugin.spark.authz.PrivilegesBuilder$.buildCommand(PrivilegesBuilder.scala:172) at org.apache.kyuubi.plugin.spark.authz.PrivilegesBuilder$.build(PrivilegesBuilder.scala:224) at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.checkPrivileges(RuleAuthorization.scala:50) at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:36) at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:33) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211) at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) at scala.collection.immutable.List.foldLeft(List.scala:91) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200) at scala.collection.immutable.List.foreach(List.scala:431) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88) at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179) at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:126) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:122) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:118) at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:136) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:154) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:151) at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:204) at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:249) at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:218) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79) at org.apache.spark.sql.Dataset.(Dataset.scala:220) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617) ... 47 elided Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.connector.catalog.Column at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 118 more

Anything else?

I'm using the Spark authentication plugin of Kyuubi here. I've found that when checking permissions, an exception occurs because the SparkTable class in Paimon carries high - version Spark classes. I compared the implementations of different versions. In versions 0.8 and earlier of Paimon, the SparkTable was a Java class. After version 0.9, it was rewritten in Scala. This has led to the SparkTable class importing classes that only exist in high - version Spark after compilation.

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

thomasg19930417 avatar Feb 08 '25 01:02 thomasg19930417

We should keep using Java for implementation here. This should be able to avoid the import issues caused by Scala compilation when relying on the high - version Spark. Or is it possible to directly specify different Spark versions to generate different common JARs?

thomasg19930417 avatar Feb 10 '25 03:02 thomasg19930417

We should keep using Java for implementation here. This should be able to avoid the import issues caused by Scala compilation when relying on the high - version Spark. Or is it possible to directly specify different Spark versions to generate different common JARs?

Attempts to modify the POM to the corresponding Spark version resulted in compilation failures, as some features from a higher - version Spark were separately introduced.

thomasg19930417 avatar Feb 10 '25 08:02 thomasg19930417

@JingsongLi Could you spare some time to take a look at this issue? The current problem is preventing the upgrade to Paimon 1.0 version.

thomasg19930417 avatar Feb 10 '25 09:02 thomasg19930417

This problem was solved by adding an empty implementation of Column in paimon - spark3.3.

thomasg19930417 avatar Feb 11 '25 06:02 thomasg19930417

This problem was solved by adding an empty implementation of Column in paimon - spark3.3.

@thomasg19930417 use an empty implementation of Column,and the authentication plugin of Kyuubi work good?

dyp12 avatar Jul 14 '25 06:07 dyp12

@thomasg19930417 bro, did you figure this out? I’m facing the same issue and would really appreciate your solution.

ElancerBlack avatar Sep 25 '25 03:09 ElancerBlack