spark-excel icon indicating copy to clipboard operation
spark-excel copied to clipboard

Showing a Spark DataFrame throws error: java.lang.NoClassDefFoundError: com/zaxxer/sparsebits/SparseBitSet

Open dsaad68 opened this issue 2 years ago • 1 comments

Expected Behavior

I'm trying to read and show an Excel file, but I receive this error.

Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1) (Daniels-PC executor driver): java.lang.NoClassDefFoundError: com/zaxxer/sparsebits/SparseBitSet

Steps to Reproduce (for bugs)

from pyspark.sql import SparkSession
from pyspark.sql.functions import input_file_name

# Assembly or download spark-excel and its dependencies
jars = [
    "C:\\Users\\Dsaad\\GitHub\\pyspark\\jars\\commons-collections4-4.4.jar",
    "C:\\Users\\Dsaad\\GitHub\\pyspark\\jars\\poi-ooxml-schemas-4.1.2.jar",
    "C:\\Users\\Dsaad\\GitHub\\pyspark\\jars\\spark-excel_2.12-0.14.0.jar",
    "C:\\Users\\Dsaad\\GitHub\\pyspark\\jars\\xmlbeans-3.1.0.jar",
    "C:\\Users\\Dsaad\\GitHub\\pyspark\\jars\\poi-ooxml-4.1.2.jar",
    "C:\\Users\\Dsaad\\GitHub\\pyspark\\jars\\log4j-1.2.17.jar"    
]

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .config("spark.jars", ",".join(jars)) \
    .config("spark.executor.memory", "3g") \
    .config("spark.driver.memory", "3g") \
    .getOrCreate()

df2 = spark.read.format("excel") \
           .option("header", False) \
           .option("inferSchema", False) \
           .load("./data/SalesDatenTB01102019.xlsx")

df2.show(10)

error:


Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1) (Daniels-PC executor driver): java.lang.NoClassDefFoundError: com/zaxxer/sparsebits/SparseBitSet
	at shadeio.poi.ss.format.CellNumberFormatter.formatValue(CellNumberFormatter.java:478)
	at shadeio.poi.ss.format.CellFormatter.format(CellFormatter.java:93)
	at shadeio.poi.ss.format.CellFormatPart.apply(CellFormatPart.java:436)
	at shadeio.poi.ss.format.CellFormat.apply(CellFormat.java:251)
	at shadeio.poi.ss.usermodel.DataFormatter.getFormat(DataFormatter.java:352)
	at shadeio.poi.ss.usermodel.DataFormatter.getFormat(DataFormatter.java:309)
	at shadeio.poi.ss.usermodel.DataFormatter.getFormattedNumberString(DataFormatter.java:868)
	at shadeio.poi.ss.usermodel.DataFormatter.formatCellValue(DataFormatter.java:1021)
	at shadeio.poi.ss.usermodel.DataFormatter.formatCellValue(DataFormatter.java:971)
	at shadeio.poi.ss.usermodel.DataFormatter.formatCellValue(DataFormatter.java:950)
	at com.crealytics.spark.v2.excel.ExcelHelper.safeCellStringValue(ExcelHelper.scala:90)
	at com.crealytics.spark.v2.excel.ExcelParser.$anonfun$makeConverter$38(ExcelParser.scala:301)
	at com.crealytics.spark.v2.excel.ExcelParser.nullSafeDatum(ExcelParser.scala:318)
	at com.crealytics.spark.v2.excel.ExcelParser.$anonfun$makeConverter$37(ExcelParser.scala:300)
	at com.crealytics.spark.v2.excel.ExcelParser.convert(ExcelParser.scala:376)
	at com.crealytics.spark.v2.excel.ExcelParser.$anonfun$parse$2(ExcelParser.scala:338)
	at com.crealytics.spark.v2.excel.ExcelParser$.$anonfun$parseIterator$1(ExcelParser.scala:420)
	at org.apache.spark.sql.catalyst.util.FailureSafeParser.parse(FailureSafeParser.scala:60)
	at com.crealytics.spark.v2.excel.ExcelParser$.$anonfun$parseIterator$2(ExcelParser.scala:425)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
	at org.apache.spark.sql.execution.datasources.v2.PartitionReaderFromIterator.next(PartitionReaderFromIterator.scala:26)
	at org.apache.spark.sql.execution.datasources.v2.PartitionReaderWithPartitionValues.next(PartitionReaderWithPartitionValues.scala:48)
	at org.apache.spark.sql.execution.datasources.v2.PartitionedFileReader.next(FilePartitionReaderFactory.scala:55)
	at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:64)
	at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:93)
	at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:130)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:349)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException: com.zaxxer.sparsebits.SparseBitSet
	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 46 more

Driver stacktrace:



Your Environment

  • Spark version and language: Python 3.9.7, Spark 3.2.1
  • Spark-Excel version: 2.12-0.14.0.jar
  • Operating System: Windows 11

dsaad68 avatar Mar 05 '22 10:03 dsaad68

Can you try newer versions, especially 0.16.5-pre2?

nightscape avatar Mar 24 '22 09:03 nightscape

should be fixed by 0.18.0 - reopen if this version does not help

pjfanning avatar Sep 17 '22 11:09 pjfanning