iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

ClassCastException with spark.sql.datetime.java8API.enabled=true

Open PaulLiang1 opened this issue 2 years ago • 1 comments

  • Spark Version: 3.2.0
  • Iceberg Version: 0.13.1

When spark.sql.datetime.java8API.enabled=true is set,

Doing Rewrite manifest on a date partitioned table throws the following exception:

Job aborted due to stage failure: Task 0 in stage 36333.0 failed 5 times, most recent failure: Lost task 0.4 in stage 36333.0 (TID 140410) (ip-123.us-west-2.compute.internal executor 77): java.lang.ClassCastException: java.time.LocalDate cannot be cast to java.sql.Date
	at org.apache.iceberg.spark.SparkValueConverter.convert(SparkValueConverter.java:77)
	at org.apache.iceberg.spark.SparkStructLike.get(SparkStructLike.java:48)
	at org.apache.iceberg.PartitionSummary.updateFields(PartitionSummary.java:59)
	at org.apache.iceberg.PartitionSummary.update(PartitionSummary.java:51)
	at org.apache.iceberg.ManifestWriter.addEntry(ManifestWriter.java:87)
	at org.apache.iceberg.ManifestWriter.existing(ManifestWriter.java:135)
	at org.apache.iceberg.spark.actions.BaseRewriteManifestsSparkAction.writeManifest(BaseRewriteManifestsSparkAction.java:332)
	at org.apache.iceberg.spark.actions.BaseRewriteManifestsSparkAction.lambda$toManifests$afb7bc39$1(BaseRewriteManifestsSparkAction.java:354)
	at org.apache.spark.sql.Dataset.$anonfun$mapPartitions$1(Dataset.scala:2867)
	at org.apache.spark.sql.execution.MapPartitionsExec.$anonfun$doExecute$3(objects.scala:201)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:133)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

This is caused by casting https://github.com/apache/iceberg/blob/0.13.x/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkValueConverter.java#L77

PaulLiang1 avatar Jun 21 '22 11:06 PaulLiang1

+1 ,i meet the same thing

yqf1991 avatar Aug 08 '22 05:08 yqf1991

+1 facing similar issue when trying to re-write manifests. Caused by: java.lang.ClassCastException: class java.time.LocalDate cannot be cast to class java.sql.Date (java.time.LocalDate is in module java.base of loader 'bootstrap'; java.sql.Date is in module java.sql of loader 'platform')

mrendi29 avatar Sep 24 '22 00:09 mrendi29

Added a pr for the fix : https://github.com/apache/iceberg/pull/5860

singhpk234 avatar Sep 26 '22 15:09 singhpk234

This issue has been resolved by #5860.

aokolnychyi avatar Nov 14 '22 23:11 aokolnychyi