spark-excel
spark-excel copied to clipboard
[python/pyspark in CDSW]Throw error " java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/catalog/TableProvider" when read excel file
Expected Behavior
Load local or remote(path is 'alluxio:///abc/edf.xlsx') excel file properly in python/pyspark
Current Behavior
after added spark-excel jar files into spark session, throw following error also broken normal spark function spark.read.csv()...
Py4JJavaError: An error occurred while calling o110.load. : java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/catalog/TableProvider at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:757) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:419) at java.lang.ClassLoader.loadClass(ClassLoader.java:352) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) at java.util.ServiceLoader$1.next(ServiceLoader.java:480) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247) at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259) at scala.collection.AbstractTraversable.filter(Traversable.scala:104) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:624) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:196) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:180) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.connector.catalog.TableProvider at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:419) at java.lang.ClassLoader.loadClass(ClassLoader.java:352) ... 38 more
Context - My Code is writing in Python/Pyspark(NOT Scale) running on CDSW Jupyterlab
Not sure if above issue is caused by version compatibility issue between spark-excel and my running Pyspark or Spark-Excel totally not support such use case.
Thanks for your advice in advance !
Hi @mryangch Thanks for bringing up this issue. Please help share your pyspark-version and if you point your pyspark to a specific SPARK_HOME, please share that also. I am not sure yet, however, this seems to be a version mismatch between pyspark & spark or spark-excel and spark. Sincerely,
Hi @mryangch Please help share the pyspark and spark versions (if you specified SPARK_HOME)
I got the same issue .
Dependencies :
sparkVersion = '2.4.8'
esVersion = '7.13.4'
implementation 'com.crealytics:spark-excel_2.12:0.14.0'
implementation "org.elasticsearch:elasticsearch-spark-20_2.12:${esVersion}"
implementation "org.elasticsearch:elasticsearch:${esVersion}"
implementation "org.elasticsearch.client:elasticsearch-rest-client:${esVersion}"
implementation "org.elasticsearch.client:elasticsearch-rest-high-level-client:${esVersion}"
compileOnly("org.apache.spark:spark-sql_2.12:${sparkVersion}")
compileOnly("org.apache.spark:spark-core_2.12:${sparkVersion}")