hudi icon indicating copy to clipboard operation
hudi copied to clipboard

java.lang.ClassCastException: class org.apache.avro.generic.GenericData$Record cannot be cast to class org.apache.hudi.avro.model.HoodieDeleteRecordList

Open abhiNB-star opened this issue 5 months ago • 11 comments

25/06/09 07:35:44 INFO BlockManagerInfo: Added broadcast_39_piece0 in memory on 10.51.0.162:34665 (size: 57.5 KiB, free: 992.6 MiB) 25/06/09 07:35:44 INFO BlockManagerInfo: Added broadcast_39_piece0 in memory on 10.51.0.194:35583 (size: 57.5 KiB, free: 998.1 MiB) 25/06/09 07:35:45 INFO TaskSetManager: Starting task 3.0 in stage 72.0 (TID 543) (10.51.0.194, executor 2, partition 3, PROCESS_LOCAL, 10100 bytes) 25/06/09 07:35:45 WARN TaskSetManager: Lost task 1.0 in stage 72.0 (TID 541) (10.51.0.194 executor 2): org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: Error occurs when executing map at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source) at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(Unknown Source) at java.base/java.util.concurrent.ForkJoinTask.reportException(Unknown Source) at java.base/java.util.concurrent.ForkJoinTask.invoke(Unknown Source) at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(Unknown Source) at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source) at java.base/java.util.stream.ReferencePipeline.collect(Unknown Source) at org.apache.hudi.common.data.HoodieBaseListData.(HoodieBaseListData.java:46) at org.apache.hudi.common.data.HoodieListData.(HoodieListData.java:69) at org.apache.hudi.common.data.HoodieListData.flatMap(HoodieListData.java:136) at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeyPrefixes(HoodieBackedTableMetadata.java:214) at org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$convertMetadataToPartitionStatRecords$4665d616$1(HoodieTableMetadataUtil.java:2700) at org.apache.hudi.data.HoodieJavaRDD.lambda$mapToPair$aa72055d$1(HoodieJavaRDD.java:173) at org.apache.spark.api.java.JavaPairRDD$.$anonfun$pairFunToScalaFun$1(JavaPairRDD.scala:1073) at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) Caused by: org.apache.hudi.exception.HoodieException: Error occurs when executing map at org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:40) at org.apache.hudi.common.data.HoodieListData.lambda$flatMap$0(HoodieListData.java:135) at java.base/java.util.stream.ReferencePipeline$7$1.accept(Unknown Source) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown Source) at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source) at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown Source) at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown Source) at java.base/java.util.stream.AbstractTask.compute(Unknown Source) at java.base/java.util.concurrent.CountedCompleter.exec(Unknown Source) at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source) at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source) at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) Caused by: org.apache.hudi.exception.HoodieException: Exception when reading log file at org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scanInternalV1(AbstractHoodieLogRecordScanner.java:390) at org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scanInternal(AbstractHoodieLogRecordScanner.java:252) at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.scanByKeyPrefixes(HoodieMergedLogRecordScanner.java:199) at org.apache.hudi.metadata.HoodieMetadataLogRecordReader.getRecordsByKeyPrefixes(HoodieMetadataLogRecordReader.java:87) at org.apache.hudi.metadata.HoodieBackedTableMetadata.readLogRecords(HoodieBackedTableMetadata.java:344) at org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeyPrefixes$7539c171$1(HoodieBackedTableMetadata.java:234) at org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:38) ... 14 more Caused by: java.lang.ClassCastException: class org.apache.avro.generic.GenericData$Record cannot be cast to class org.apache.hudi.avro.model.HoodieDeleteRecordList (org.apache.avro.generic.GenericData$Record is in unnamed module of loader 'app'; org.apache.hudi.avro.model.HoodieDeleteRecordList is in unnamed module of loader org.apache.spark.util.MutableURLClassLoader @7066107d) at org.apache.hudi.common.table.log.block.HoodieDeleteBlock.deserialize(HoodieDeleteBlock.java:168) at org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getRecordsToDelete(HoodieDeleteBlock.java:123) at org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:680) at org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scanInternalV1(AbstractHoodieLogRecordScanner.java:380) ... 20 more

25/06/09 07:35:46 INFO TaskSetManager: Starting task 1.1 in stage 72.0 (TID 544) (10.51.0.162, executor 3, partition 1, PROCESS_LOCAL, 10100 bytes) 25/06/09 07:35:46 INFO TaskSetManager: Lost task 2.0 in stage 72.0 (TID 542) on 10.51.0.162, executor 3: org.apache.hudi.exception.HoodieException (org.apache.hudi.exception.HoodieException: Error occurs when executing map) [duplicate 1] 25/06/09 07:35:46 INFO TaskSetManager: Starting task 2.1 in stage 72.0 (TID 545) (10.51.0.2, executor 1, partition 2, PROCESS_LOCAL, 10100 bytes) 25/06/09 07:35:46 WARN TaskSetManager: Lost task 0.0 in stage 72.0 (TID 540) (10.51.0.2 executor 1): org.apache.hudi.exception.HoodieException: Error occurs when executing map

abhiNB-star avatar Jun 09 '25 08:06 abhiNB-star

@ad1happy2go Can You help me with this

abhiNB-star avatar Jun 09 '25 08:06 abhiNB-star

hoodie.archive.merge.enable: true hoodie.auto.adjust.lock.configs: true hoodie.bulkinsert.shuffle.parallelism: 2 hoodie.clean.automatic: true hoodie.cleaner.fileversions.retained: 5 hoodie.cleaner.parallelism: 200 hoodie.cleaner.policy: KEEP_LATEST_FILE_VERSIONS hoodie.cleaner.policy.failed.writes: LAZY hoodie.datasource.hive_sync.assume_date_partitioning: false hoodie.datasource.hive_sync.database: test hoodie.datasource.hive_sync.enable: true hoodie.datasource.hive_sync.jdbcurl: jdbc:hive2://192.168.3.56:10000 hoodie.datasource.hive_sync.metastore.uris: thrift://192.168.3.30:9083 hoodie.datasource.hive_sync.mode: hms hoodie.datasource.hive_sync.omit_metadata_fields: true hoodie.datasource.hive_sync.partition_extractor_class: org.apache.hudi.hive.MultiPartKeysValueExtractor hoodie.datasource.hive_sync.partition_fields: created_on_partitioned hoodie.datasource.hive_sync.password: hive hoodie.datasource.hive_sync.table: actual_party_new hoodie.datasource.hive_sync.username: hive hoodie.datasource.write.hive_style_partitioning: true hoodie.datasource.write.keygenerator.class: org.apache.hudi.keygen.CustomKeyGenerator hoodie.datasource.write.partitionpath.field: created_on_partitioned:TIMESTAMP hoodie.datasource.write.payload.class: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload hoodie.datasource.write.precombine.field: starship_offset hoodie.datasource.write.reconcile.schema: false hoodie.datasource.write.recordkey.field: id hoodie.datasource.write.table.type: COPY_ON_WRITE hoodie.deltastreamer.continuous.mode: true hoodie.deltastreamer.ingestion.tablesToBeIngested: test.actual_party_new hoodie.deltastreamer.ingestion.targetBasePath: gs://{my}/nbdata/data/v4/test/actual-party/new hoodie.deltastreamer.ingestion.test.actual_party_new.configFile: gs://{my}/nbdata/config/v4/test/actual-party/new/actual_party_new.properties hoodie.deltastreamer.keygen.timebased.output.dateformat: yyyyMM hoodie.deltastreamer.keygen.timebased.timestamp.type: EPOCHMILLISECONDS hoodie.deltastreamer.source.kafka.enable.auto.commit: false hoodie.deltastreamer.source.kafka.enable.failOnDataLoss: true hoodie.deltastreamer.source.kafka.topic: gate-v4-20231130124511-20231130124534.gate.party hoodie.index.type: BLOOM hoodie.insert.shuffle.parallelism: 2 hoodie.parquet.compression.codec: snappy hoodie.parquet.max.file.size: 134217728 hoodie.parquet.small.file.limit: 102400000 hoodie.table.name: test.actual_party_new hoodie.upsert.shuffle.parallelism: 15 hoodie.write.concurrency.mode: optimistic_concurrency_control hoodie.write.lock.provider: org.apache.hudi.client.transaction.lock.InProcessLockProvider

abhiNB-star avatar Jun 09 '25 08:06 abhiNB-star

I am trying to ingest the data using spark+kafka streaming to hudi table with the RLI index. but unfortunately ingesting records is throwing the below issue.

Steps to reproduce the behavior:

first build dependency for hudi 1.0.2 and spark 3.5.5 add hudi RLI index Expected behavior

it should work end to end with RLI index enable

Environment Description

Hudi version : 1.0.2

Spark version : 3.5.5

Hive version : NA

Hadoop version : NA

Storage (HDFS/S3/GCS..) : GCS

Running on Docker? (yes/no) : Yes @codope @ad1happy2go @bhasudha

abhiNB-star avatar Jun 09 '25 08:06 abhiNB-star

@codope @ad1happy2go @bhasudha ??

abhiNB-star avatar Jun 12 '25 05:06 abhiNB-star

@abhiNB-star Is it possible to zip the .hoodie dir under your table base path and share with us? To unblock, you can disable column stats and partition stats:

hoodie.metadata.index.column.stats.enable = false
hoodie.metadata.index.partition.stats.enable = false

codope avatar Jun 12 '25 07:06 codope

@codope yes ill send you

abhiNB-star avatar Jun 13 '25 09:06 abhiNB-star

@codope

hoodie_bundle.tar.gz

abhiNB-star avatar Jun 13 '25 09:06 abhiNB-star

Hi @codope

@abhiNB-star and me tried different approaches to specify the JAR in the spark-submit, but the issue remains unresolved.

First, we added the Hudi Spark bundle JAR using extraClassPath, which did not solve the issue:

spark.driver.extraClassPath: hudi-spark3.5-bundle_2.12-1.0.2.jar  
spark.executor.extraClassPath: hudi-spark3.5-bundle_2.12-1.0.2.jar

Second, we attempted to remove spark.driver.userClassPathFirst and spark.executor.userClassPathFirst and then ran the application again, but that did not resolve the issue.

rangareddy avatar Jun 13 '25 15:06 rangareddy

@codope also facing this issue when i am trying with spark 3.1.1 and hudi 0.14 i used latest branch to pack the required jars

Image

abhiNB-star avatar Jun 16 '25 05:06 abhiNB-star

@codope also facing this issue when i am trying with spark 3.1.1 and hudi 0.14 i used latest branch to pack the required jars

Image

To resolve this issue, you need to create the folder and property file manually. This issue is solved in Hudi 1.x versions.

rangareddy avatar Jun 17 '25 03:06 rangareddy

hi @codope any luck with that hudi 1.0.2 Error ?

abhiNB-star avatar Jun 19 '25 08:06 abhiNB-star

Hi @codope

@abhiNB-star and me tried different approaches to specify the JAR in the spark-submit, but the issue remains unresolved.

First, we added the Hudi Spark bundle JAR using extraClassPath, which did not solve the issue:

spark.driver.extraClassPath: hudi-spark3.5-bundle_2.12-1.0.2.jar  
spark.executor.extraClassPath: hudi-spark3.5-bundle_2.12-1.0.2.jar

Second, we attempted to remove spark.driver.userClassPathFirst and spark.executor.userClassPathFirst and then ran the application again, but that did not resolve the issue.

Hi @codope

Could you please help here in troubleshooting this issue further?

@abhiNB-star Meanwhile, could you please share the logs of the failed container application?

rangareddy avatar Jul 01 '25 12:07 rangareddy

@rangareddy i will share the full logs of a failed job issue : This happens while reading from metadata some commits happens but after several commits it randomly happens

Image

abhiNB-star avatar Jul 02 '25 06:07 abhiNB-star

here is the fix: https://github.com/apache/hudi/pull/13515/files

danny0405 avatar Jul 05 '25 00:07 danny0405

hi @danny0405 Thanks That is Fixed

just Getting another simillar issue : while rollback

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2856) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2792) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2791) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2791) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1247) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1247) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1247) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3060) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2994) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2983) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:989) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2393) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2414) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2433) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2458) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1049) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:410) at org.apache.spark.rdd.RDD.collect(RDD.scala:1048) at org.apache.spark.rdd.PairRDDFunctions.$anonfun$countByKey$1(PairRDDFunctions.scala:367) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:410) at org.apache.spark.rdd.PairRDDFunctions.countByKey(PairRDDFunctions.scala:367) at org.apache.spark.api.java.JavaPairRDD.countByKey(JavaPairRDD.scala:314) at org.apache.hudi.data.HoodieJavaPairRDD.countByKey(HoodieJavaPairRDD.java:109) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.buildProfile(BaseSparkCommitActionExecutor.java:204) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:171) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:83) at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:61) ... 12 more Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: Error occurs when executing map at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source) at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(Unknown Source) at java.base/java.util.concurrent.ForkJoinTask.reportException(Unknown Source) at java.base/java.util.concurrent.ForkJoinTask.invoke(Unknown Source) at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(Unknown Source) at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source) at java.base/java.util.stream.ReferencePipeline.collect(Unknown Source) at org.apache.hudi.common.engine.HoodieLocalEngineContext.map(HoodieLocalEngineContext.java:90) at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:275) at org.apache.hudi.metadata.BaseTableMetadata.readRecordIndex(BaseTableMetadata.java:299) at org.apache.hudi.index.SparkMetadataTableRecordIndex$RecordIndexFileGroupLookupFunction.call(SparkMetadataTableRecordIndex.java:169) at org.apache.hudi.index.SparkMetadataTableRecordIndex$RecordIndexFileGroupLookupFunction.call(SparkMetadataTableRecordIndex.java:156) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsToPair$1(JavaRDDLike.scala:186) at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:858) at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:858) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367) at org.apache.spark.rdd.RDD.iterator(RDD.scala:331) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) ... 3 more Caused by: org.apache.hudi.exception.HoodieException: Error occurs when executing map at org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:40) at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown Source) at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source) at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown Source) at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown Source) at java.base/java.util.stream.AbstractTask.compute(Unknown Source) at java.base/java.util.concurrent.CountedCompleter.exec(Unknown Source) at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source) at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source) at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) Caused by: org.apache.hudi.exception.HoodieMetadataException: Error retrieving rollback commits for instant [20250707101624890__20250707101645835__rollback__COMPLETED] at org.apache.hudi.metadata.HoodieTableMetadataUtil.getRollbackedCommits(HoodieTableMetadataUtil.java:2052) at org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$getValidInstantTimestamps$60(HoodieTableMetadataUtil.java:1970) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(Unknown Source) at java.base/java.util.stream.ReferencePipeline$2$1.accept(Unknown Source) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown Source) at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown Source) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown Source) at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source) at java.base/java.util.stream.ReferencePipeline.forEach(Unknown Source) at org.apache.hudi.metadata.HoodieTableMetadataUtil.getValidInstantTimestamps(HoodieTableMetadataUtil.java:1970) at org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:492) at org.apache.hudi.metadata.HoodieBackedTableMetadata.openReaders(HoodieBackedTableMetadata.java:446) at org.apache.hudi.metadata.HoodieBackedTableMetadata.getOrCreateReaders(HoodieBackedTableMetadata.java:431) at org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupKeysFromFileSlice(HoodieBackedTableMetadata.java:308) at org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$f9381e22$1(HoodieBackedTableMetadata.java:280) at org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:38) ... 13 more Caused by: java.io.IOException: unable to read commit metadata for instant [==>20250707101624890__rollback__REQUESTED] at org.apache.hudi.common.table.timeline.versioning.v2.CommitMetadataSerDeV2.deserialize(CommitMetadataSerDeV2.java:88) at org.apache.hudi.common.table.timeline.HoodieTimeline.readNonEmptyInstantContent(HoodieTimeline.java:153) at org.apache.hudi.common.table.timeline.HoodieTimeline.readRollbackPlan(HoodieTimeline.java:244) at org.apache.hudi.metadata.HoodieTableMetadataUtil.getRollbackedCommits(HoodieTableMetadataUtil.java:2034) ... 30 more Caused by: java.lang.ClassCastException: class org.apache.avro.generic.GenericData$Record cannot be cast to class org.apache.avro.specific.SpecificRecordBase (org.apache.avro.generic.GenericData$Record and org.apache.avro.specific.SpecificRecordBase are in unnamed module of loader 'app') at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:143) at org.apache.hudi.common.table.timeline.versioning.v2.CommitMetadataSerDeV2.deserialize(CommitMetadataSerDeV2.java:78) ... 33 more

abhiNB-star avatar Jul 07 '25 11:07 abhiNB-star

Thanks, we had an internal discussion and it seems an issue related with JDK17, the JDK17 does not allow class casting from different classloaders even though the cast is valid. We are thinking through a solution for it.

danny0405 avatar Jul 08 '25 01:07 danny0405

Hi @abhiNB-star are you using Java 17 to run Spark 3.5.5?

yihua avatar Jul 08 '25 01:07 yihua

@yihua @danny0405 abhinav_raj1_nobroker_in@bastion-host-new:~/testing/new_test/party$ kubectl exec -n spark-hood -it testing-party-job-old-hudi-driver -- sh

java --version

openjdk 11.0.27 2025-04-15 OpenJDK Runtime Environment Temurin-11.0.27+6 (build 11.0.27+6) OpenJDK 64-Bit Server VM Temurin-11.0.27+6 (build 11.0.27+6, mixed mode, sharing)

Looks like the pod has java 11

abhiNB-star avatar Jul 08 '25 06:07 abhiNB-star

Got it. The same class loading problem exists in Java 11. If possible, could you try Java 8 and see if that solves the problem?

yihua avatar Jul 09 '25 00:07 yihua

@abhiNB-star ^

yihua avatar Jul 09 '25 00:07 yihua

hi @yihua sure thing

abhiNB-star avatar Jul 09 '25 05:07 abhiNB-star

@yihua I am Facing some Compatibility issues with this combination i am trying to resolve it

abhiNB-star avatar Jul 10 '25 06:07 abhiNB-star

abhinav_raj1_nb_in@bastion-host-new:~/testing/new_test/party$ kubectl exec -n spark-hood -it testing-party-job-old-hudi-driver -- sh $ java -version openjdk version "1.8.0_452" OpenJDK Runtime Environment (build 1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09) OpenJDK 64-Bit Server VM (build 25.452-b09, mixed mode)

25/07/14 06:06:09 WARN HoodieMultiTableStreamer: --enable-hive-sync will be deprecated in a future release; please use --enable-sync instead for Hive syncing 25/07/14 06:06:09 WARN HoodieMultiTableStreamer: --target-table is deprecated and will be removed in a future release due to it's useless; please use hoodie.streamer.ingestion.tablesToBeIngested to configure multiple target tables 25/07/14 06:06:09 INFO SparkContext: Running Spark version 3.5.5 25/07/14 06:06:09 INFO SparkContext: OS info Linux, 6.1.112+, amd64 25/07/14 06:06:09 INFO SparkContext: Java version 1.8.0_452

hi guys @yihua @danny0405 So with java 8 this issue is resolved

but is there anything we can do to fix this with java 11 or 17

abhiNB-star avatar Jul 14 '25 08:07 abhiNB-star

I have the same issue and I also confirm it's not present with Java 8.

Let me know if I can assist with testing anything.

ZeWaren avatar Jul 25 '25 12:07 ZeWaren

@yihua @danny0405 after i migrated an existing table from hudi 0.13 to 0.14.1 it was running good but when i migrated it from 0.14.1 to 1.0.2 i am getting these kind of logs which are running from past 10-11 hours

25/07/30 08:45:25 INFO SparkContext: Created broadcast 1897 from broadcast at DAGScheduler.scala:1585 25/07/30 08:45:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1901 (MapPartitionsRDD[3821] at mapToPair at HoodieSparkEngineContext.java:175) (first 15 tasks are for partitions Vector(0)) 25/07/30 08:45:25 INFO TaskSchedulerImpl: Adding task set 1901.0 with 1 tasks resource profile 0 25/07/30 08:45:25 INFO TaskSetManager: Starting task 0.0 in stage 1901.0 (TID 4113) (10.51.2.243, executor 3, partition 0, PROCESS_LOCAL, 10151 bytes) 25/07/30 08:45:25 INFO BlockManagerInfo: Added broadcast_1897_piece0 in memory on 10.51.2.243:33361 (size: 90.8 KiB, free: 970.1 MiB) 25/07/30 08:45:25 INFO TaskSetManager: Finished task 0.0 in stage 1901.0 (TID 4113) in 68 ms on 10.51.2.243 (executor 3) (1/1) 25/07/30 08:45:25 INFO TaskSchedulerImpl: Removed TaskSet 1901.0, whose tasks have all completed, from pool 25/07/30 08:45:25 INFO DAGScheduler: ResultStage 1901 (collectAsMap at HoodieSparkEngineContext.java:178) finished in 0.093 s 25/07/30 08:45:25 INFO DAGScheduler: Job 1896 is finished. Cancelling potential speculative or zombie tasks for this job 25/07/30 08:45:25 INFO TaskSchedulerImpl: Killing all running tasks in stage 1901: Stage finished 25/07/30 08:45:25 INFO DAGScheduler: Job 1896 finished: collectAsMap at HoodieSparkEngineContext.java:178, took 0.097804 s 25/07/30 08:45:25 INFO HoodieLogFileReader: Closing Log file reader .commits_.archive.1119_1-0-1 25/07/30 08:45:25 INFO CodecPool: Got brand-new compressor [.gz] 25/07/30 08:45:25 INFO LSMTimelineWriter: Writing schema {"type":"record","name":"HoodieLSMTimelineInstant","namespace":"org.apache.hudi.avro.model","fields":[{"name":"instantTime","type":["null",{"type":"string","avro.java.string":"String"}],"default":null},{"name":"completionTime","type":["null",{"type":"string","avro.java.string":"String"}],"default":null},{"name":"action","type":["null",{"type":"string","avro.java.string":"String"}],"default":null},{"name":"metadata","type":["null","bytes"],"default":null},{"name":"plan","type":["null","bytes"],"default":null},{"name":"version","type":"int","default":1}]} 25/07/30 08:45:26 INFO BlockManagerInfo: Removed broadcast_1897_piece0 on spark-fc6a30985a5a4d6d-driver-svc.spark-hood.svc:7079 in memory (size: 90.8 KiB, free: 886.7 MiB) 25/07/30 08:45:26 INFO BlockManagerInfo: Removed broadcast_1897_piece0 on 10.51.2.243:33361 in memory (size: 90.8 KiB, free: 970.2 MiB) 25/07/30 08:45:26 INFO BlockManagerInfo: Removed broadcast_1896_piece0 on spark-fc6a30985a5a4d6d-driver-svc.spark-hood.svc:7079 in memory (size: 90.8 KiB, free: 886.8 MiB) 25/07/30 08:45:26 INFO BlockManagerInfo: Removed broadcast_1896_piece0 on 10.51.0.231:40967 in memory (size: 90.8 KiB, free: 970.2 MiB) 25/07/30 08:45:26 INFO SparkContext: Starting job: collectAsMap at HoodieSparkEngineContext.java:178 25/07/30 08:45:26 INFO DAGScheduler: Got job 1897 (collectAsMap at HoodieSparkEngineContext.java:178) with 1 output partitions 25/07/30 08:45:26 INFO DAGScheduler: Final stage: ResultStage 1902 (collectAsMap at HoodieSparkEngineContext.java:178) 25/07/30 08:45:26 INFO DAGScheduler: Parents of final stage: List() 25/07/30 08:45:26 INFO DAGScheduler: Missing parents: List() 25/07/30 08:45:26 INFO DAGScheduler: Submitting ResultStage 1902 (MapPartitionsRDD[3823] at mapToPair at HoodieSparkEngineContext.java:175), which has no missing parents 25/07/30 08:45:26 INFO MemoryStore: Block broadcast_1898 stored as values in memory (estimated size 241.3 KiB, free 886.6 MiB) 25/07/30 08:45:26 INFO MemoryStore: Block broadcast_1898_piece0 stored as bytes in memory (estimated size 90.8 KiB, free 886.5 MiB) 25/07/30 08:45:26 INFO BlockManagerInfo: Added broadcast_1898_piece0 in memory on spark-fc6a30985a5a4d6d-driver-svc.spark-hood.svc:7079 (size: 90.8 KiB, free: 886.7 MiB) 25/07/30 08:45:26 INFO SparkContext: Created broadcast 1898 from broadcast at DAGScheduler.scala:1585 25/07/30 08:45:26 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1902 (MapPartitionsRDD[3823] at mapToPair at HoodieSparkEngineContext.java:175) (first 15 tasks are for partitions Vector(0)) 25/07/30 08:45:26 INFO TaskSchedulerImpl: Adding task set 1902.0 with 1 tasks resource profile 0 25/07/30 08:45:26 INFO TaskSetManager: Starting task 0.0 in stage 1902.0 (TID 4114) (10.51.3.66, executor 4, partition 0, PROCESS_LOCAL, 10151 bytes) 25/07/30 08:45:26 INFO BlockManagerInfo: Added broadcast_1898_piece0 in memory on 10.51.3.66:43689 (size: 90.8 KiB, free: 970.1 MiB) 25/07/30 08:45:26 INFO TaskSetManager: Finished task 0.0 in stage 1902.0 (TID 4114) in 89 ms on 10.51.3.66 (executor 4) (1/1) 25/07/30 08:45:26 INFO TaskSchedulerImpl: Removed TaskSet 1902.0, whose tasks have all completed, from pool 25/07/30 08:45:26 INFO DAGScheduler: ResultStage 1902 (collectAsMap at HoodieSparkEngineContext.java:178) finished in 0.117 s 25/07/30 08:45:26 INFO DAGScheduler: Job 1897 is finished. Cancelling potential speculative or zombie tasks for this job 25/07/30 08:45:26 INFO TaskSchedulerImpl: Killing all running tasks in stage 1902: Stage finished 25/07/30 08:45:26 INFO DAGScheduler: Job 1897 finished: collectAsMap at HoodieSparkEngineContext.java:178, took 0.118602 s 25/07/30 08:45:27 INFO CodecPool: Got brand-new compressor [.gz] 25/07/30 08:45:27 INFO LSMTimelineWriter: Writing schema {"type":"record","name":"HoodieLSMTimelineInstant","namespace":"org.apache.hudi.avro.model","fields":[{"name":"instantTime","type":["null",{"type":"string","avro.java.string":"String"}],"default":null},{"name":"completionTime","type":["null",{"type":"string","avro.java.string":"String"}],"default":null},{"name":"action","type":["null",{"type":"string","avro.java.string":"String"}],"default":null},{"name":"metadata","type":["null","bytes"],"default":null},{"name":"plan","type":["null","bytes"],"default":null},{"name":"version","type":"int","default":1}]}

abhiNB-star avatar Jul 30 '25 08:07 abhiNB-star

@yihua @danny0405 can you guide me why this is happening

abhiNB-star avatar Jul 30 '25 08:07 abhiNB-star

@abhiNB-star I dont see any error is the log, Speculative execution may be turned on for this job, so it killed some zombie tasks.

ad1happy2go avatar Jul 30 '25 10:07 ad1happy2go

hi @ad1happy2go let me give you full logs on slack

abhiNB-star avatar Jul 30 '25 11:07 abhiNB-star

We are marking this issue as closed because the solution has been successfully delivered and integrated with the merge of pull request #13990.

rangareddy avatar Oct 30 '25 09:10 rangareddy