spark-excel icon indicating copy to clipboard operation
spark-excel copied to clipboard

[BUG] Spark Excel is Incompatible with AWS EMR v6.13 and higher

Open johnboyer opened this issue 1 year ago • 2 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

Problem

The latest release of spark-excel_2.12:3.4.1_0.20.1 is incompatible with the latest releases of AWS EMR v6.13 and higher because they're built on Scala 2.12.15.

We use JDK 17 with Spark 3.4.1 and Scala 2.12.15 so our Java applications can run in the latest EMR versions. However, with the spark-excel_2.12:3.4.1_0.20.1 dependency included we consistently get the following runtime error in the EMR logs:

java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.Map$Map2 to field org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages$AddWebUIFilter.filterParams of type scala.collection.immutable.Map in instance of org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages$AddWebUIFilter

We've tried all machinations to no avail, including getting support from AWS. Can you provide support for Scala 2.12.15 for backward compatibility with Spark 3.4.1 in the EMR?

References:

  • Scala Compatibility Chart
  • https://github.com/crealytics/spark-excel/blob/508116e7252c5672e0a5afe5c134c1d7bf697f45/build.sc#L151

Expected Behavior

We'd prefer backward compatibility with Scala 2.12.15 so we can continue using Spark Excel in the AWS EMR v6.13+.

Steps To Reproduce

  1. Create a simple Java application that uses Spark 3.4.1 and pin its Scala dependencies to 2.12.15.
  2. Then add the spark-excel_2.12:3.4.1_0.20.1
  3. Write some code that reads the file into a DataSet
  4. Deploy the app to the EMR and run it

Environment

- Spark version: 3.4.1
- Spark-Excel version: spark-excel_2.12:3.4.1_0.20.1
- OS: Amazon Linux release 2.0.20231012.1
- Cluster environment: EMR v6.13.0

Anything else?

No response

johnboyer avatar Oct 30 '23 22:10 johnboyer

There was a similar issue in Hudi. @johnboyer have you tried building spark-excel with 2.12.15? Does it fix the problem?

nightscape avatar Nov 14 '23 10:11 nightscape

Hi @nightscape: We tried excluding scala libraries and other dependency machinations, but it never solved the binary incompatibility problem. We're a Java shop with no Scala experience, so we could not figure out how to build the library. If you give us step-by-step instructions, we can try it. In the meantime, we migrated our code to fastexcel, a non-spark pure Java library. Thank you.

johnboyer avatar Nov 16 '23 18:11 johnboyer