datafusion-comet
datafusion-comet copied to clipboard
java.lang.NoSuchMethodError: 'scala.collection.Seq org.apache.spark.sql.execution.PartitionedFileUtil$.splitFiles(org.apache.spark.sql.SparkSession
Ran into a
java.lang.NoSuchMethodError: 'scala.collection.Seq
org.apache.spark.sql.execution.PartitionedFileUtil$.splitFiles(org.apache.spark.sql.SparkSession,
org.apache.spark.sql.execution.datasources.FileStatusWithMetadata, boolean, long,
org.apache.spark.sql.catalyst.InternalRow)'
Environment
- Spark 3.5.5
- Scala 2.12.18
- Custom built image on top Apache Spark OSS distribution (spark:3.5.5-scala2.12-java11-python3-ubuntu )
- Running on a standalone pod in K8s (local mode)
Note: Funny enough that after going through the application logic, it does output 1 file into our object storage and then fails. Also, I enabled verbose explanation parameter, but it won't log out the plan details. Just logs out saying comet is initialized
It looks like this error is happening in Spark code and not in Comet code?
It is difficult to know how to help with this since you have a custom Spark image.
The third parameter to PartitionedFileUtils.splitFiles is a Path which your call seems to be missing. The full stack trace might show where this is being called from and that might need to be fixed.
See https://github.com/apache/spark/blob/d81a2051900b70e6a4b56cb2af38143af357f76e/sql/core/src/main/scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala#L28
This should have been addressed by https://github.com/apache/datafusion-comet/pull/1565 and https://github.com/apache/datafusion-comet/pull/1573.