spark-excel icon indicating copy to clipboard operation
spark-excel copied to clipboard

[python/pyspark in CDSW]Throw error " java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/catalog/TableProvider" when read excel file

Open mryangch opened this issue 3 years ago • 3 comments

Expected Behavior

Load local or remote(path is 'alluxio:///abc/edf.xlsx') excel file properly in python/pyspark

Current Behavior

after added spark-excel jar files into spark session, throw following error also broken normal spark function spark.read.csv()...

Py4JJavaError: An error occurred while calling o110.load. : java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/catalog/TableProvider at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:757) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:419) at java.lang.ClassLoader.loadClass(ClassLoader.java:352) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) at java.util.ServiceLoader$1.next(ServiceLoader.java:480) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247) at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259) at scala.collection.AbstractTraversable.filter(Traversable.scala:104) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:624) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:196) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:180) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.connector.catalog.TableProvider at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:419) at java.lang.ClassLoader.loadClass(ClassLoader.java:352) ... 38 more

Context - My Code is writing in Python/Pyspark(NOT Scale) running on CDSW Jupyterlab

image

Not sure if above issue is caused by version compatibility issue between spark-excel and my running Pyspark or Spark-Excel totally not support such use case.

Thanks for your advice in advance !

mryangch avatar Jan 06 '22 10:01 mryangch

Hi @mryangch Thanks for bringing up this issue. Please help share your pyspark-version and if you point your pyspark to a specific SPARK_HOME, please share that also. I am not sure yet, however, this seems to be a version mismatch between pyspark & spark or spark-excel and spark. Sincerely,

quanghgx avatar Jan 10 '22 17:01 quanghgx

Hi @mryangch Please help share the pyspark and spark versions (if you specified SPARK_HOME)

quanghgx avatar Jan 15 '22 12:01 quanghgx

I got the same issue . Dependencies : sparkVersion = '2.4.8' esVersion = '7.13.4' implementation 'com.crealytics:spark-excel_2.12:0.14.0'
implementation "org.elasticsearch:elasticsearch-spark-20_2.12:${esVersion}" implementation "org.elasticsearch:elasticsearch:${esVersion}" implementation "org.elasticsearch.client:elasticsearch-rest-client:${esVersion}" implementation "org.elasticsearch.client:elasticsearch-rest-high-level-client:${esVersion}" compileOnly("org.apache.spark:spark-sql_2.12:${sparkVersion}") compileOnly("org.apache.spark:spark-core_2.12:${sparkVersion}")

panksapur-clgx avatar Mar 02 '22 10:03 panksapur-clgx