spark-excel icon indicating copy to clipboard operation
spark-excel copied to clipboard

Reading excel file in Azure Databricks

Open grajee-everest opened this issue 2 years ago • 59 comments

I'm tried to use spark-excel in Azure Databricks but I seem to be be running into an error. I earlier tried the same using SQLServer Big Data Cluster but I was unable to.

Current Behavior I'm getting an error java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B

image

I loaded first the Maven Coordinates and got the error. I later followed the link and loaded the jar files and yet got the same error as shown in the screenshot.

image

Steps to Reproduce (for bugs)

df = spark.read.format("excel") \
   .option("header", True) \
   .option("inferSchema", True) \
   .load(f"dbfs:/FileStore/tables/users.xls") \
   .withColumn("file_name", input_file_name())

Your Environment

Azure Databricks image

grajee-everest avatar Nov 28 '21 21:11 grajee-everest

I would really try to not download and add the JARs manually, but use Maven's package resolution. What was the error you got there?

nightscape avatar Nov 28 '21 22:11 nightscape

I got the same error when I first tried it with Maven Coordinates as in the screenshot below. Seeing this error, I went through the dependencies at the link and manually loaded the jar files hoping that it would help. But it did not.

image

image

Here is the error that got generated:


Py4JJavaError Traceback (most recent call last) in 1 #sampleDataFilePath = "dbfs:/FileStore/tables/users.xls" 2 ----> 3 df = spark.read.format("excel")
4 .option("header", True)
5 .option("inferSchema", True) \

/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options) 202 self.options(**options) 203 if isinstance(path, str): --> 204 return self._df(self._jreader.load(path)) 205 elif path is not None: 206 if type(path) != list:

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args) 1302 1303 answer = self.gateway_client.send_command(command) -> 1304 return_value = get_return_value( 1305 answer, self.gateway_client, self.target_id, self.name) 1306

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 115 def deco(*a, **kw): 116 try: --> 117 return f(*a, **kw) 118 except py4j.protocol.Py4JJavaError as e: 119 converted = convert_exception(e.java_exception)

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) 325 if answer[1] == REFERENCE_TYPE: --> 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". 328 format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling o307.load. : java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B at org.apache.commons.io.output.AbstractByteArrayOutputStream.needNewBuffer(AbstractByteArrayOutputStream.java:104) at org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream.(UnsynchronizedByteArrayOutputStream.java:51) at shadeio.poi.util.IOUtils.peekFirstNBytes(IOUtils.java:110) at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:206) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:172) at com.crealytics.spark.v2.excel.ExcelHelper.getWorkbook(ExcelHelper.scala:107) at com.crealytics.spark.v2.excel.ExcelHelper.getRows(ExcelHelper.scala:122) at com.crealytics.spark.v2.excel.ExcelTable.infer(ExcelTable.scala:69) at com.crealytics.spark.v2.excel.ExcelTable.inferSchema(ExcelTable.scala:42) at org.apache.spark.sql.execution.datasources.v2.FileTable.$anonfun$dataSchema$4(FileTable.scala:69) at scala.Option.orElse(Option.scala:447) at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema$lzycompute(FileTable.scala:69) at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema(FileTable.scala:63) at org.apache.spark.sql.execution.datasources.v2.FileTable.schema$lzycompute(FileTable.scala:82) at org.apache.spark.sql.execution.datasources.v2.FileTable.schema(FileTable.scala:80) at com.crealytics.spark.v2.excel.ExcelDataSource.inferSchema(ExcelDataSource.scala:85) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81) at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:388) at scala.Option.map(Option.scala:230) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:367) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)

grajee-everest avatar Nov 29 '21 00:11 grajee-everest

If it helps these are the jar files that I see running for the session:

%scala
spark.sparkContext.listJars.foreach(println)

spark://xx.xxx.xx.xx:40525/jars/addedFile2184961893124998763poi_shared_strings_2_2_3-55036.jar spark://xx.xxx.xx.xx:40525/jars/addedFile8325949175049880530poi_5_1_0-6eaa4.jar spark://xx.xxx.xx.xx:40525/jars/addedFile3805277380370442712spoiwo_2_12_2_0_0-5d426.jar spark://xx.xxx.xx.xx:40525/jars/addedFile4821020784640732815commons_text_1_9-9ec33.jar spark://xx.xxx.xx.xx:40525/jars/addedFile6096385456097086834commons_collections4_4_4-86bd5.jar spark://xx.xxx.xx.xx:40525/jars/addedFile5503460718089690954poi_ooxml_5_1_0-dcd47.jar spark://xx.xxx.xx.xx:40525/jars/addedFile1801717094295843813commons_compress_1_21-ae1b7.jar spark://xx.xxx.xx.xx:40525/jars/addedFile3469926387869248457h2_1_4_200-17cf6.jar spark://xx.xxx.xx.xx:40525/jars/addedFile7124418099051517404curvesapi_1_6-ef037.jar spark://xx.xxx.xx.xx:40525/jars/addedFile3524630059114379065slf4j_api_1_7_32-db310.jar spark://xx.xxx.xx.xx:40525/jars/addedFile621063403924903495SparseBitSet_1_2-c8237.jar spark://xx.xxx.xx.xx:40525/jars/addedFile5513775878198382075commons_io_2_11_0-998c5.jar spark://xx.xxx.xx.xx:40525/jars/addedFile6021795642522535665spark_excel_2_12_3_1_2_0_15_1-54852.jar spark://xx.xxx.xx.xx:40525/jars/addedFile2561448775843921624poi_ooxml_lite_5_1_0-a9fef.jar spark://xx.xxx.xx.xx:40525/jars/addedFile1605810761903966851commons_lang3_3_11-82b59.jar spark://xx.xxx.xx.xx:40525/jars/addedFile2616706435049414994commons_codec_1_15-3e3d3.jar spark://xx.xxx.xx.xx:40525/jars/addedFile3670030969644712160log4j_api_2_14_1-5a13d.jar spark://xx.xxx.xx.xx:40525/jars/addedFile6859359805595503404xmlbeans_5_0_2-db545.jar spark://xx.xxx.xx.xx:40525/jars/addedFile5420236778608197626scala_xml_2_12_2_0_0-e8c94.jar spark://xx.xxx.xx.xx:40525/jars/addedFile2437294818883127996commons_math3_3_6_1-876c0.jar spark://xx.xxx.xx.xx:40525/jars/addedFile3167668463888463121excel_streaming_reader_3_2_3-b4c68.jar

grajee-everest avatar Nov 29 '21 00:11 grajee-everest

FYI - I was able to get the elastacloud module working

grajee-everest avatar Nov 29 '21 14:11 grajee-everest

Wow, I didn't even know that project. Thanks for pointing it out!

nightscape avatar Nov 29 '21 16:11 nightscape

I would like to still get this working on Databricks and SQLServer BDC. Note that it is not working in Databricks for few others as well.

grajee-everest avatar Nov 29 '21 17:11 grajee-everest

I am also facing similar issue. Is it resolved? java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B

KanakVantaku avatar Dec 02 '21 04:12 KanakVantaku

FWIW I'm getting the exact same error: java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B

Azure Databricks: 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12). Maven Install from Databricks: com.crealytics:spark-excel_2.12:3.1.2_0.15.1

Any help would be greatly appreciated @nightscape

soleyjh avatar Dec 02 '21 05:12 soleyjh

Hello Guys, I am also facing the same issue. Is there any solution or did someone found any alternative method?

MounikaKolisetty avatar Dec 02 '21 07:12 MounikaKolisetty

@grajee-everest @KanakVantaku @soleyjh @MounikaKolisetty could one of you try if changing https://github.com/crealytics/spark-excel/blob/main/build.sbt#L32-L37 to

shadeRenames ++= Seq(
  "org.apache.poi.**" -> "shadeio.poi.@1",
  "spoiwo.**" -> "shadeio.spoiwo.@1",
  "com.github.pjfanning.**" -> "shadeio.pjfanning.@1",
  "org.apache.commons.io.**" -> "shadeio.commons.io.@1",
  "org.apache.commons.compress.**" -> "shadeio.commons.compress.@1"
)

resolves this problem?

You would need to sbt publishLocal a version of the JAR that you upload to Databricks then.

nightscape avatar Dec 02 '21 10:12 nightscape

Hello @nightscape , What does sbt publishLocal means? Can you please explain me in detail?

MounikaKolisetty avatar Dec 02 '21 12:12 MounikaKolisetty

Hi @MounikaKolisetty,

SBT is the Scala (or Simple) Build Tool. You can get instructions on how to install it here: https://www.scala-sbt.org/ Once you have it installed, you should be able to run

cd /path/to/spark-excel
# Make the changes from above
sbt publishLocal

This should start building the project and copy the generated JAR files to a path like ~/.iyv2/.../spark-excel...jar. You can take the JAR from there and try to upload and use it in Databricks.

nightscape avatar Dec 03 '21 09:12 nightscape

same error here : java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B

Azure Databricks: 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12). Maven Install from Databricks: com.crealytics:spark-excel_2.12:3.1.2_0.15.1

Any solution yet?

ciaeric avatar Dec 10 '21 01:12 ciaeric

Same error.

Azure Databricks: 9.0 (includes Apache Spark 3.1.2, Scala 2.12). Maven Install from Databricks: com.crealytics:spark-excel_2.12:3.1.2_0.16.0

Dunehub avatar Dec 14 '21 12:12 Dunehub

@ciaeric @Dunehub if possible, please try my proposal in https://github.com/crealytics/spark-excel/issues/467#issuecomment-984506825

nightscape avatar Dec 14 '21 14:12 nightscape

I am also facing same issue on Azure Databricks and looking for possible solutions. Adding dependencies as per attachment but not working

image

spaw6065 avatar Dec 21 '21 05:12 spaw6065

Once the build here finishes successfully, you can try version 0.16.1-pre1: https://github.com/crealytics/spark-excel/actions/runs/1607777770

nightscape avatar Dec 21 '21 16:12 nightscape

@ciaeric @Dunehub @spaw6065 @MounikaKolisetty @grajee-everest Please provide feedback here if the shading worked. The change is still on a branch, so if you don't provide feedback, it won't get merged and won't be part of the next release.

nightscape avatar Dec 22 '21 15:12 nightscape

To me it still didn't worked

Steps followed :-

  1. Added jar from dbfs image

  2. executed sample code image

spaw6065 avatar Dec 23 '21 10:12 spaw6065

Error trace NoClassDefFoundError: shadeio/commons/io/output/UnsynchronizedByteArrayOutputStream Caused by: ClassNotFoundException: shadeio.commons.io.output.UnsynchronizedByteArrayOutputStream at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:206) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:172) at com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$1(WorkbookReader.scala:55) at scala.Option.fold(Option.scala:251) at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:55) at com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:16) at com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:15) at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:50) at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:32) at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:32) at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:104) at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:103) at com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:172) at scala.Option.getOrElse(Option.scala:189) at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:171) at com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:36) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:36) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:385) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:390) at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:346) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:346) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:237) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:3) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:47) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:49) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw.<init>(command-2541019815824441:51) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw.<init>(command-2541019815824441:53) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw.<init>(command-2541019815824441:55) at $line03e3e0503061413eab90de3bf6be643427.$read.<init>(command-2541019815824441:57) at $line03e3e0503061413eab90de3bf6be643427.$read$.<init>(command-2541019815824441:61) at $line03e3e0503061413eab90de3bf6be643427.$read$.<clinit>(command-2541019815824441) at $line03e3e0503061413eab90de3bf6be643427.$eval$.$print$lzycompute(<notebook>:7) at $line03e3e0503061413eab90de3bf6be643427.$eval$.$print(<notebook>:6) at $line03e3e0503061413eab90de3bf6be643427.$eval.$print(<notebook>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:747) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1020) at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:568) at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36) at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:567) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:594) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:564) at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:219) at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$1(ScalaDriverLocal.scala:235) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:902) at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:855) at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:235) at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$13(DriverLocal.scala:541) at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:266) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:261) at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:258) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:50) at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:305) at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:297) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:50) at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:518) at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:689) at scala.util.Try$.apply(Try.scala:213) at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:681) at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:522) at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:634) at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:427) at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:370) at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:221) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: shadeio.commons.io.output.UnsynchronizedByteArrayOutputStream at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:419) at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151) at java.lang.ClassLoader.loadClass(ClassLoader.java:352) at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:206) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:172) at com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$1(WorkbookReader.scala:55) at scala.Option.fold(Option.scala:251) at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:55) at com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:16) at com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:15) at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:50) at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:32) at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:32) at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:104) at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:103) at com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:172) at scala.Option.getOrElse(Option.scala:189) at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:171) at com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:36) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:36) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:385) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:390) at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:346) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:346) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:237) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:3) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:47) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:49) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw.<init>(command-2541019815824441:51) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw.<init>(command-2541019815824441:53) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw.<init>(command-2541019815824441:55) at $line03e3e0503061413eab90de3bf6be643427.$read.<init>(command-2541019815824441:57) at $line03e3e0503061413eab90de3bf6be643427.$read$.<init>(command-2541019815824441:61) at $line03e3e0503061413eab90de3bf6be643427.$read$.<clinit>(command-2541019815824441) at $line03e3e0503061413eab90de3bf6be643427.$eval$.$print$lzycompute(<notebook>:7) at $line03e3e0503061413eab90de3bf6be643427.$eval$.$print(<notebook>:6) at $line03e3e0503061413eab90de3bf6be643427.$eval.$print(<notebook>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:747) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1020) at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:568) at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36) at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:567) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:594) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:564) at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:219) at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$1(ScalaDriverLocal.scala:235) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:902) at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:855) at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:235) at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$13(DriverLocal.scala:541) at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:266) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:261) at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:258) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:50) at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:305) at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:297) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:50) at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:518) at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:689) at scala.util.Try$.apply(Try.scala:213) at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:681) at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:522) at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:634) at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:427) at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:370) at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:221) at java.lang.Thread.run(Thread.java:748)

spaw6065 avatar Dec 23 '21 10:12 spaw6065

Any luck i am facing the same issue. with latest jar com.crealytics:spark-excel_2.13:3.2.0_0.16.1-pre1. Py4JJavaError Traceback (most recent call last) in 20 # Do not load the kpi entries into the entry dataframe it's automated according to the csv file stored in GroupFunctions/Daily_Management/RAW/BULKUPLOADPA/DF CSV Files/Automated KPIs.csv 21 # Reading the Entries data from the Excel file ---> 22 DF_entry = spark.read.format("com.crealytics.spark.excel")
23 .option("header", "true")
24 .option("dataAddress" , "'" + sheetname + "'"+"!A1") \

/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options) 202 self.options(**options) 203 if isinstance(path, str): --> 204 return self._df(self._jreader.load(path)) 205 elif path is not None: 206 if type(path) != list:

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args) 1302 1303 answer = self.gateway_client.send_command(command) -> 1304 return_value = get_return_value( 1305 answer, self.gateway_client, self.target_id, self.name) 1306

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 115 def deco(*a, **kw): 116 try: --> 117 return f(*a, **kw) 118 except py4j.protocol.Py4JJavaError as e: 119 converted = convert_exception(e.java_exception)

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) 325 if answer[1] == REFERENCE_TYPE: --> 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". 328 format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling o2609.load. : java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B at org.apache.commons.io.output.AbstractByteArrayOutputStream.needNewBuffer(AbstractByteArrayOutputStream.java:104) at org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream.(UnsynchronizedByteArrayOutputStream.java:51) at shadeio.poi.util.IOUtils.peekFirstNBytes(IOUtils.java:110) at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:206) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:172) at com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$1(WorkbookReader.scala:55) at scala.Option.fold(Option.scala:251) at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:55) at com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:16) at com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:15) at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:50) at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:32) at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:32) at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:104) at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:103) at com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:172) at scala.Option.getOrElse(Option.scala:189) at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:171) at com.crealytics.spark.excel.ExcelRelation.(ExcelRelation.scala:36) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:36) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:390) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:444) at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:400) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:400) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)

Rajendramca avatar Jan 06 '22 14:01 Rajendramca

I've also faced this error? is there any solution? Is there another way to read an xls file on Databricks?

sabrishami avatar Jan 07 '22 08:01 sabrishami

Faced the same error with the 0.16 and 0.16.1 versions of this library. But then I tried an older version (com.crealytics:spark-excel_2.12:0.14.0) and it is working like a charm now.

mayankgupta15 avatar Jan 09 '22 12:01 mayankgupta15

NoClassDefFoundError: shadeio/commons/io/output/UnsynchronizedByteArrayOutputStream Caused by: ClassNotFoundException

Getting the same error in aws glue with Glue 3.0 Spark 3.1 version - using pyspark.

chowdarykish avatar Jan 12 '22 00:01 chowdarykish

I'm looking for the new version as I want to utilize the dynamic partitioning to create multiple excel files in parallel

chowdarykish avatar Jan 12 '22 00:01 chowdarykish

FYI, getting same error even with scala - AWS Glue 3.0 Scala 2 Spark 3.1

chowdarykish avatar Jan 12 '22 04:01 chowdarykish

Faced the same error with the 0.16 and 0.16.1 versions of this library. But then I tried an older version (com.crealytics:spark-excel_2.12:0.14.0) and it is working like a charm now.

This clue works for me!!

Thanks

sabrishami avatar Jan 12 '22 07:01 sabrishami

but I need the dynamic partitioning feature, which is not available in the older version.

chowdarykish avatar Jan 12 '22 16:01 chowdarykish

As I don't have time to look into this, your best option is to try different versions of shading deps here. Once you have found a version that works on AWS / Azure Databricks, I'd be happy to do another pre-release and get it merged if it works for everyone.

nightscape avatar Jan 12 '22 21:01 nightscape

This

Faced the same error with the 0.16 and 0.16.1 versions of this library. But then I tried an older version (com.crealytics:spark-excel_2.12:0.14.0) and it is working like a charm now.

I was facing the same issue and this worked. Thank you

Krishnapriya-RK avatar Jan 13 '22 07:01 Krishnapriya-RK