hudi
hudi copied to clipboard
java.lang.ClassCastException: class org.apache.avro.generic.GenericData$Record cannot be cast to class org.apache.hudi.avro.model.HoodieDeleteRecordList
25/06/09 07:35:44 INFO BlockManagerInfo: Added broadcast_39_piece0 in memory on 10.51.0.162:34665 (size: 57.5 KiB, free: 992.6 MiB)
25/06/09 07:35:44 INFO BlockManagerInfo: Added broadcast_39_piece0 in memory on 10.51.0.194:35583 (size: 57.5 KiB, free: 998.1 MiB)
25/06/09 07:35:45 INFO TaskSetManager: Starting task 3.0 in stage 72.0 (TID 543) (10.51.0.194, executor 2, partition 3, PROCESS_LOCAL, 10100 bytes)
25/06/09 07:35:45 WARN TaskSetManager: Lost task 1.0 in stage 72.0 (TID 541) (10.51.0.194 executor 2): org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: Error occurs when executing map
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.reportException(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.invoke(Unknown Source)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.base/java.util.stream.ReferencePipeline.collect(Unknown Source)
at org.apache.hudi.common.data.HoodieBaseListData.
25/06/09 07:35:46 INFO TaskSetManager: Starting task 1.1 in stage 72.0 (TID 544) (10.51.0.162, executor 3, partition 1, PROCESS_LOCAL, 10100 bytes) 25/06/09 07:35:46 INFO TaskSetManager: Lost task 2.0 in stage 72.0 (TID 542) on 10.51.0.162, executor 3: org.apache.hudi.exception.HoodieException (org.apache.hudi.exception.HoodieException: Error occurs when executing map) [duplicate 1] 25/06/09 07:35:46 INFO TaskSetManager: Starting task 2.1 in stage 72.0 (TID 545) (10.51.0.2, executor 1, partition 2, PROCESS_LOCAL, 10100 bytes) 25/06/09 07:35:46 WARN TaskSetManager: Lost task 0.0 in stage 72.0 (TID 540) (10.51.0.2 executor 1): org.apache.hudi.exception.HoodieException: Error occurs when executing map
@ad1happy2go Can You help me with this
hoodie.archive.merge.enable: true hoodie.auto.adjust.lock.configs: true hoodie.bulkinsert.shuffle.parallelism: 2 hoodie.clean.automatic: true hoodie.cleaner.fileversions.retained: 5 hoodie.cleaner.parallelism: 200 hoodie.cleaner.policy: KEEP_LATEST_FILE_VERSIONS hoodie.cleaner.policy.failed.writes: LAZY hoodie.datasource.hive_sync.assume_date_partitioning: false hoodie.datasource.hive_sync.database: test hoodie.datasource.hive_sync.enable: true hoodie.datasource.hive_sync.jdbcurl: jdbc:hive2://192.168.3.56:10000 hoodie.datasource.hive_sync.metastore.uris: thrift://192.168.3.30:9083 hoodie.datasource.hive_sync.mode: hms hoodie.datasource.hive_sync.omit_metadata_fields: true hoodie.datasource.hive_sync.partition_extractor_class: org.apache.hudi.hive.MultiPartKeysValueExtractor hoodie.datasource.hive_sync.partition_fields: created_on_partitioned hoodie.datasource.hive_sync.password: hive hoodie.datasource.hive_sync.table: actual_party_new hoodie.datasource.hive_sync.username: hive hoodie.datasource.write.hive_style_partitioning: true hoodie.datasource.write.keygenerator.class: org.apache.hudi.keygen.CustomKeyGenerator hoodie.datasource.write.partitionpath.field: created_on_partitioned:TIMESTAMP hoodie.datasource.write.payload.class: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload hoodie.datasource.write.precombine.field: starship_offset hoodie.datasource.write.reconcile.schema: false hoodie.datasource.write.recordkey.field: id hoodie.datasource.write.table.type: COPY_ON_WRITE hoodie.deltastreamer.continuous.mode: true hoodie.deltastreamer.ingestion.tablesToBeIngested: test.actual_party_new hoodie.deltastreamer.ingestion.targetBasePath: gs://{my}/nbdata/data/v4/test/actual-party/new hoodie.deltastreamer.ingestion.test.actual_party_new.configFile: gs://{my}/nbdata/config/v4/test/actual-party/new/actual_party_new.properties hoodie.deltastreamer.keygen.timebased.output.dateformat: yyyyMM hoodie.deltastreamer.keygen.timebased.timestamp.type: EPOCHMILLISECONDS hoodie.deltastreamer.source.kafka.enable.auto.commit: false hoodie.deltastreamer.source.kafka.enable.failOnDataLoss: true hoodie.deltastreamer.source.kafka.topic: gate-v4-20231130124511-20231130124534.gate.party hoodie.index.type: BLOOM hoodie.insert.shuffle.parallelism: 2 hoodie.parquet.compression.codec: snappy hoodie.parquet.max.file.size: 134217728 hoodie.parquet.small.file.limit: 102400000 hoodie.table.name: test.actual_party_new hoodie.upsert.shuffle.parallelism: 15 hoodie.write.concurrency.mode: optimistic_concurrency_control hoodie.write.lock.provider: org.apache.hudi.client.transaction.lock.InProcessLockProvider
I am trying to ingest the data using spark+kafka streaming to hudi table with the RLI index. but unfortunately ingesting records is throwing the below issue.
Steps to reproduce the behavior:
first build dependency for hudi 1.0.2 and spark 3.5.5 add hudi RLI index Expected behavior
it should work end to end with RLI index enable
Environment Description
Hudi version : 1.0.2
Spark version : 3.5.5
Hive version : NA
Hadoop version : NA
Storage (HDFS/S3/GCS..) : GCS
Running on Docker? (yes/no) : Yes @codope @ad1happy2go @bhasudha
@codope @ad1happy2go @bhasudha ??
@abhiNB-star Is it possible to zip the .hoodie dir under your table base path and share with us? To unblock, you can disable column stats and partition stats:
hoodie.metadata.index.column.stats.enable = false
hoodie.metadata.index.partition.stats.enable = false
@codope yes ill send you
Hi @codope
@abhiNB-star and me tried different approaches to specify the JAR in the spark-submit, but the issue remains unresolved.
First, we added the Hudi Spark bundle JAR using extraClassPath, which did not solve the issue:
spark.driver.extraClassPath: hudi-spark3.5-bundle_2.12-1.0.2.jar
spark.executor.extraClassPath: hudi-spark3.5-bundle_2.12-1.0.2.jar
Second, we attempted to remove spark.driver.userClassPathFirst and spark.executor.userClassPathFirst and then ran the application again, but that did not resolve the issue.
@codope also facing this issue when i am trying with spark 3.1.1 and hudi 0.14 i used latest branch to pack the required jars
@codope also facing this issue when i am trying with spark 3.1.1 and hudi 0.14 i used latest branch to pack the required jars
To resolve this issue, you need to create the folder and property file manually. This issue is solved in Hudi 1.x versions.
hi @codope any luck with that hudi 1.0.2 Error ?
Hi @codope
@abhiNB-star and me tried different approaches to specify the JAR in the spark-submit, but the issue remains unresolved.
First, we added the Hudi Spark bundle JAR using extraClassPath, which did not solve the issue:
spark.driver.extraClassPath: hudi-spark3.5-bundle_2.12-1.0.2.jar spark.executor.extraClassPath: hudi-spark3.5-bundle_2.12-1.0.2.jarSecond, we attempted to remove spark.driver.userClassPathFirst and spark.executor.userClassPathFirst and then ran the application again, but that did not resolve the issue.
Hi @codope
Could you please help here in troubleshooting this issue further?
@abhiNB-star Meanwhile, could you please share the logs of the failed container application?
@rangareddy i will share the full logs of a failed job issue : This happens while reading from metadata some commits happens but after several commits it randomly happens
here is the fix: https://github.com/apache/hudi/pull/13515/files
hi @danny0405 Thanks That is Fixed
just Getting another simillar issue : while rollback
Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2856) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2792) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2791) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2791) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1247) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1247) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1247) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3060) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2994) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2983) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:989) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2393) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2414) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2433) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2458) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1049) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:410) at org.apache.spark.rdd.RDD.collect(RDD.scala:1048) at org.apache.spark.rdd.PairRDDFunctions.$anonfun$countByKey$1(PairRDDFunctions.scala:367) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:410) at org.apache.spark.rdd.PairRDDFunctions.countByKey(PairRDDFunctions.scala:367) at org.apache.spark.api.java.JavaPairRDD.countByKey(JavaPairRDD.scala:314) at org.apache.hudi.data.HoodieJavaPairRDD.countByKey(HoodieJavaPairRDD.java:109) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.buildProfile(BaseSparkCommitActionExecutor.java:204) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:171) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:83) at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:61) ... 12 more Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: Error occurs when executing map at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source) at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(Unknown Source) at java.base/java.util.concurrent.ForkJoinTask.reportException(Unknown Source) at java.base/java.util.concurrent.ForkJoinTask.invoke(Unknown Source) at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(Unknown Source) at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source) at java.base/java.util.stream.ReferencePipeline.collect(Unknown Source) at org.apache.hudi.common.engine.HoodieLocalEngineContext.map(HoodieLocalEngineContext.java:90) at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:275) at org.apache.hudi.metadata.BaseTableMetadata.readRecordIndex(BaseTableMetadata.java:299) at org.apache.hudi.index.SparkMetadataTableRecordIndex$RecordIndexFileGroupLookupFunction.call(SparkMetadataTableRecordIndex.java:169) at org.apache.hudi.index.SparkMetadataTableRecordIndex$RecordIndexFileGroupLookupFunction.call(SparkMetadataTableRecordIndex.java:156) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsToPair$1(JavaRDDLike.scala:186) at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:858) at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:858) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367) at org.apache.spark.rdd.RDD.iterator(RDD.scala:331) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) ... 3 more Caused by: org.apache.hudi.exception.HoodieException: Error occurs when executing map at org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:40) at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown Source) at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source) at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown Source) at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown Source) at java.base/java.util.stream.AbstractTask.compute(Unknown Source) at java.base/java.util.concurrent.CountedCompleter.exec(Unknown Source) at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source) at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source) at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) Caused by: org.apache.hudi.exception.HoodieMetadataException: Error retrieving rollback commits for instant [20250707101624890__20250707101645835__rollback__COMPLETED] at org.apache.hudi.metadata.HoodieTableMetadataUtil.getRollbackedCommits(HoodieTableMetadataUtil.java:2052) at org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$getValidInstantTimestamps$60(HoodieTableMetadataUtil.java:1970) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(Unknown Source) at java.base/java.util.stream.ReferencePipeline$2$1.accept(Unknown Source) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown Source) at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown Source) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown Source) at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source) at java.base/java.util.stream.ReferencePipeline.forEach(Unknown Source) at org.apache.hudi.metadata.HoodieTableMetadataUtil.getValidInstantTimestamps(HoodieTableMetadataUtil.java:1970) at org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:492) at org.apache.hudi.metadata.HoodieBackedTableMetadata.openReaders(HoodieBackedTableMetadata.java:446) at org.apache.hudi.metadata.HoodieBackedTableMetadata.getOrCreateReaders(HoodieBackedTableMetadata.java:431) at org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupKeysFromFileSlice(HoodieBackedTableMetadata.java:308) at org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$f9381e22$1(HoodieBackedTableMetadata.java:280) at org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:38) ... 13 more Caused by: java.io.IOException: unable to read commit metadata for instant [==>20250707101624890__rollback__REQUESTED] at org.apache.hudi.common.table.timeline.versioning.v2.CommitMetadataSerDeV2.deserialize(CommitMetadataSerDeV2.java:88) at org.apache.hudi.common.table.timeline.HoodieTimeline.readNonEmptyInstantContent(HoodieTimeline.java:153) at org.apache.hudi.common.table.timeline.HoodieTimeline.readRollbackPlan(HoodieTimeline.java:244) at org.apache.hudi.metadata.HoodieTableMetadataUtil.getRollbackedCommits(HoodieTableMetadataUtil.java:2034) ... 30 more Caused by: java.lang.ClassCastException: class org.apache.avro.generic.GenericData$Record cannot be cast to class org.apache.avro.specific.SpecificRecordBase (org.apache.avro.generic.GenericData$Record and org.apache.avro.specific.SpecificRecordBase are in unnamed module of loader 'app') at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:143) at org.apache.hudi.common.table.timeline.versioning.v2.CommitMetadataSerDeV2.deserialize(CommitMetadataSerDeV2.java:78) ... 33 more
Thanks, we had an internal discussion and it seems an issue related with JDK17, the JDK17 does not allow class casting from different classloaders even though the cast is valid. We are thinking through a solution for it.
Hi @abhiNB-star are you using Java 17 to run Spark 3.5.5?
@yihua @danny0405 abhinav_raj1_nobroker_in@bastion-host-new:~/testing/new_test/party$ kubectl exec -n spark-hood -it testing-party-job-old-hudi-driver -- sh
java --version
openjdk 11.0.27 2025-04-15 OpenJDK Runtime Environment Temurin-11.0.27+6 (build 11.0.27+6) OpenJDK 64-Bit Server VM Temurin-11.0.27+6 (build 11.0.27+6, mixed mode, sharing)
Looks like the pod has java 11
Got it. The same class loading problem exists in Java 11. If possible, could you try Java 8 and see if that solves the problem?
@abhiNB-star ^
hi @yihua sure thing
@yihua I am Facing some Compatibility issues with this combination i am trying to resolve it
abhinav_raj1_nb_in@bastion-host-new:~/testing/new_test/party$ kubectl exec -n spark-hood -it testing-party-job-old-hudi-driver -- sh $ java -version openjdk version "1.8.0_452" OpenJDK Runtime Environment (build 1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09) OpenJDK 64-Bit Server VM (build 25.452-b09, mixed mode)
25/07/14 06:06:09 WARN HoodieMultiTableStreamer: --enable-hive-sync will be deprecated in a future release; please use --enable-sync instead for Hive syncing 25/07/14 06:06:09 WARN HoodieMultiTableStreamer: --target-table is deprecated and will be removed in a future release due to it's useless; please use hoodie.streamer.ingestion.tablesToBeIngested to configure multiple target tables 25/07/14 06:06:09 INFO SparkContext: Running Spark version 3.5.5 25/07/14 06:06:09 INFO SparkContext: OS info Linux, 6.1.112+, amd64 25/07/14 06:06:09 INFO SparkContext: Java version 1.8.0_452
hi guys @yihua @danny0405 So with java 8 this issue is resolved
but is there anything we can do to fix this with java 11 or 17
I have the same issue and I also confirm it's not present with Java 8.
Let me know if I can assist with testing anything.
@yihua @danny0405 after i migrated an existing table from hudi 0.13 to 0.14.1 it was running good but when i migrated it from 0.14.1 to 1.0.2 i am getting these kind of logs which are running from past 10-11 hours
25/07/30 08:45:25 INFO SparkContext: Created broadcast 1897 from broadcast at DAGScheduler.scala:1585 25/07/30 08:45:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1901 (MapPartitionsRDD[3821] at mapToPair at HoodieSparkEngineContext.java:175) (first 15 tasks are for partitions Vector(0)) 25/07/30 08:45:25 INFO TaskSchedulerImpl: Adding task set 1901.0 with 1 tasks resource profile 0 25/07/30 08:45:25 INFO TaskSetManager: Starting task 0.0 in stage 1901.0 (TID 4113) (10.51.2.243, executor 3, partition 0, PROCESS_LOCAL, 10151 bytes) 25/07/30 08:45:25 INFO BlockManagerInfo: Added broadcast_1897_piece0 in memory on 10.51.2.243:33361 (size: 90.8 KiB, free: 970.1 MiB) 25/07/30 08:45:25 INFO TaskSetManager: Finished task 0.0 in stage 1901.0 (TID 4113) in 68 ms on 10.51.2.243 (executor 3) (1/1) 25/07/30 08:45:25 INFO TaskSchedulerImpl: Removed TaskSet 1901.0, whose tasks have all completed, from pool 25/07/30 08:45:25 INFO DAGScheduler: ResultStage 1901 (collectAsMap at HoodieSparkEngineContext.java:178) finished in 0.093 s 25/07/30 08:45:25 INFO DAGScheduler: Job 1896 is finished. Cancelling potential speculative or zombie tasks for this job 25/07/30 08:45:25 INFO TaskSchedulerImpl: Killing all running tasks in stage 1901: Stage finished 25/07/30 08:45:25 INFO DAGScheduler: Job 1896 finished: collectAsMap at HoodieSparkEngineContext.java:178, took 0.097804 s 25/07/30 08:45:25 INFO HoodieLogFileReader: Closing Log file reader .commits_.archive.1119_1-0-1 25/07/30 08:45:25 INFO CodecPool: Got brand-new compressor [.gz] 25/07/30 08:45:25 INFO LSMTimelineWriter: Writing schema {"type":"record","name":"HoodieLSMTimelineInstant","namespace":"org.apache.hudi.avro.model","fields":[{"name":"instantTime","type":["null",{"type":"string","avro.java.string":"String"}],"default":null},{"name":"completionTime","type":["null",{"type":"string","avro.java.string":"String"}],"default":null},{"name":"action","type":["null",{"type":"string","avro.java.string":"String"}],"default":null},{"name":"metadata","type":["null","bytes"],"default":null},{"name":"plan","type":["null","bytes"],"default":null},{"name":"version","type":"int","default":1}]} 25/07/30 08:45:26 INFO BlockManagerInfo: Removed broadcast_1897_piece0 on spark-fc6a30985a5a4d6d-driver-svc.spark-hood.svc:7079 in memory (size: 90.8 KiB, free: 886.7 MiB) 25/07/30 08:45:26 INFO BlockManagerInfo: Removed broadcast_1897_piece0 on 10.51.2.243:33361 in memory (size: 90.8 KiB, free: 970.2 MiB) 25/07/30 08:45:26 INFO BlockManagerInfo: Removed broadcast_1896_piece0 on spark-fc6a30985a5a4d6d-driver-svc.spark-hood.svc:7079 in memory (size: 90.8 KiB, free: 886.8 MiB) 25/07/30 08:45:26 INFO BlockManagerInfo: Removed broadcast_1896_piece0 on 10.51.0.231:40967 in memory (size: 90.8 KiB, free: 970.2 MiB) 25/07/30 08:45:26 INFO SparkContext: Starting job: collectAsMap at HoodieSparkEngineContext.java:178 25/07/30 08:45:26 INFO DAGScheduler: Got job 1897 (collectAsMap at HoodieSparkEngineContext.java:178) with 1 output partitions 25/07/30 08:45:26 INFO DAGScheduler: Final stage: ResultStage 1902 (collectAsMap at HoodieSparkEngineContext.java:178) 25/07/30 08:45:26 INFO DAGScheduler: Parents of final stage: List() 25/07/30 08:45:26 INFO DAGScheduler: Missing parents: List() 25/07/30 08:45:26 INFO DAGScheduler: Submitting ResultStage 1902 (MapPartitionsRDD[3823] at mapToPair at HoodieSparkEngineContext.java:175), which has no missing parents 25/07/30 08:45:26 INFO MemoryStore: Block broadcast_1898 stored as values in memory (estimated size 241.3 KiB, free 886.6 MiB) 25/07/30 08:45:26 INFO MemoryStore: Block broadcast_1898_piece0 stored as bytes in memory (estimated size 90.8 KiB, free 886.5 MiB) 25/07/30 08:45:26 INFO BlockManagerInfo: Added broadcast_1898_piece0 in memory on spark-fc6a30985a5a4d6d-driver-svc.spark-hood.svc:7079 (size: 90.8 KiB, free: 886.7 MiB) 25/07/30 08:45:26 INFO SparkContext: Created broadcast 1898 from broadcast at DAGScheduler.scala:1585 25/07/30 08:45:26 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1902 (MapPartitionsRDD[3823] at mapToPair at HoodieSparkEngineContext.java:175) (first 15 tasks are for partitions Vector(0)) 25/07/30 08:45:26 INFO TaskSchedulerImpl: Adding task set 1902.0 with 1 tasks resource profile 0 25/07/30 08:45:26 INFO TaskSetManager: Starting task 0.0 in stage 1902.0 (TID 4114) (10.51.3.66, executor 4, partition 0, PROCESS_LOCAL, 10151 bytes) 25/07/30 08:45:26 INFO BlockManagerInfo: Added broadcast_1898_piece0 in memory on 10.51.3.66:43689 (size: 90.8 KiB, free: 970.1 MiB) 25/07/30 08:45:26 INFO TaskSetManager: Finished task 0.0 in stage 1902.0 (TID 4114) in 89 ms on 10.51.3.66 (executor 4) (1/1) 25/07/30 08:45:26 INFO TaskSchedulerImpl: Removed TaskSet 1902.0, whose tasks have all completed, from pool 25/07/30 08:45:26 INFO DAGScheduler: ResultStage 1902 (collectAsMap at HoodieSparkEngineContext.java:178) finished in 0.117 s 25/07/30 08:45:26 INFO DAGScheduler: Job 1897 is finished. Cancelling potential speculative or zombie tasks for this job 25/07/30 08:45:26 INFO TaskSchedulerImpl: Killing all running tasks in stage 1902: Stage finished 25/07/30 08:45:26 INFO DAGScheduler: Job 1897 finished: collectAsMap at HoodieSparkEngineContext.java:178, took 0.118602 s 25/07/30 08:45:27 INFO CodecPool: Got brand-new compressor [.gz] 25/07/30 08:45:27 INFO LSMTimelineWriter: Writing schema {"type":"record","name":"HoodieLSMTimelineInstant","namespace":"org.apache.hudi.avro.model","fields":[{"name":"instantTime","type":["null",{"type":"string","avro.java.string":"String"}],"default":null},{"name":"completionTime","type":["null",{"type":"string","avro.java.string":"String"}],"default":null},{"name":"action","type":["null",{"type":"string","avro.java.string":"String"}],"default":null},{"name":"metadata","type":["null","bytes"],"default":null},{"name":"plan","type":["null","bytes"],"default":null},{"name":"version","type":"int","default":1}]}
@yihua @danny0405 can you guide me why this is happening
@abhiNB-star I dont see any error is the log, Speculative execution may be turned on for this job, so it killed some zombie tasks.
hi @ad1happy2go let me give you full logs on slack
We are marking this issue as closed because the solution has been successfully delivered and integrated with the merge of pull request #13990.