datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

java.lang.NoSuchMethodError: 'scala.collection.Seq org.apache.spark.sql.execution.PartitionedFileUtil$.splitFiles(org.apache.spark.sql.SparkSession

Open mkgada opened this issue 7 months ago • 3 comments

Ran into a


 java.lang.NoSuchMethodError: 'scala.collection.Seq 
org.apache.spark.sql.execution.PartitionedFileUtil$.splitFiles(org.apache.spark.sql.SparkSession, 
org.apache.spark.sql.execution.datasources.FileStatusWithMetadata, boolean, long, 
org.apache.spark.sql.catalyst.InternalRow)'

Environment

  • Spark 3.5.5
  • Scala 2.12.18
  • Custom built image on top Apache Spark OSS distribution (spark:3.5.5-scala2.12-java11-python3-ubuntu )
  • Running on a standalone pod in K8s (local mode)

Note: Funny enough that after going through the application logic, it does output 1 file into our object storage and then fails. Also, I enabled verbose explanation parameter, but it won't log out the plan details. Just logs out saying comet is initialized

mkgada avatar Apr 11 '25 04:04 mkgada

It looks like this error is happening in Spark code and not in Comet code?

It is difficult to know how to help with this since you have a custom Spark image.

andygrove avatar Apr 11 '25 14:04 andygrove

The third parameter to PartitionedFileUtils.splitFiles is a Path which your call seems to be missing. The full stack trace might show where this is being called from and that might need to be fixed. See https://github.com/apache/spark/blob/d81a2051900b70e6a4b56cb2af38143af357f76e/sql/core/src/main/scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala#L28

parthchandra avatar Apr 11 '25 21:04 parthchandra

This should have been addressed by https://github.com/apache/datafusion-comet/pull/1565 and https://github.com/apache/datafusion-comet/pull/1573.

Kontinuation avatar Apr 12 '25 03:04 Kontinuation