hudi
hudi copied to clipboard
[SUPPORT] NoClassDefFoundError for org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile
We occasionally hit the following exception when running a Flink writer job. The job won't self-heal, but can be recovered by manually restarting the TaskManager.
MDT was enabled.
java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile
at org.apache.hudi.io.storage.HoodieHFileUtils.createHFileReader(HoodieHFileUtils.java:56)
at org.apache.hudi.io.storage.HoodieAvroHFileReader.<init>(HoodieAvroHFileReader.java:101)
at org.apache.hudi.io.storage.HoodieAvroFileReaderFactory.newHFileFileReader(HoodieAvroFileReaderFactory.java:35)
at org.apache.hudi.io.storage.HoodieFileReaderFactory.getFileReader(HoodieFileReaderFactory.java:63)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.getBaseFileReader(HoodieBackedTableMetadata.java:460)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.openReaders(HoodieBackedTableMetadata.java:433)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.getOrCreateReaders(HoodieBackedTableMetadata.java:425)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$3(HoodieBackedTableMetadata.java:239)
at java.base/java.util.HashMap.forEach(Unknown Source)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:237)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:152)
at org.apache.hudi.metadata.BaseTableMetadata.fetchAllFilesInPartition(BaseTableMetadata.java:339)
at org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:150)
at org.apache.hudi.metadata.HoodieMetadataFileSystemView.listPartition(HoodieMetadataFileSystemView.java:69)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$16(AbstractTableFileSystemView.java:428)
at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Unknown Source)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:419)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestMergedFileSlicesBeforeOrOn(AbstractTableFileSystemView.java:854)
at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:104)
at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestMergedFileSlicesBeforeOrOn(PriorityBasedFileSystemView.java:195)
at org.apache.hudi.sink.partitioner.profile.DeltaWriteProfile.smallFilesProfile(DeltaWriteProfile.java:62)
at org.apache.hudi.sink.partitioner.profile.WriteProfile.getSmallFiles(WriteProfile.java:191)
at org.apache.hudi.sink.partitioner.BucketAssigner.getSmallFileAssign(BucketAssigner.java:179)
at org.apache.hudi.sink.partitioner.BucketAssigner.addInsert(BucketAssigner.java:137)
at org.apache.hudi.sink.partitioner.BucketAssignFunction.getNewRecordLocation(BucketAssignFunction.java:215)
at org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:194)
at org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:162)
at org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:233)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:542)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:831)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:780)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:914)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
Environment Description
-
Hudi version : bdb50ddccc9631317dfb06a06abc38cbd3714ce8
-
Flink version : 1.16.1
-
Hadoop version : 3.3.4
-
Storage (HDFS/S3/GCS..) : S3
You have enabled the MDT then?
Ah, yes. I forgot MDT was enabled by default in a recent change...
I also noticed this issue with Hudi: 0.11.1 Hadoop: 3.3.4 and 3.3.5 Spark: 3.2.1
It does not happen with Hadoop 3.3.1 or 3.3.3. So it looks like the problem occurs starting in Hadoop 3.3.4
@jfrylings-twilio Did you tried the later versions of hudi i.e. 0.13.1 or 0.12.3. I tried with Hadoop 3.3.4 and Hudi 0.13.1 and 0.12.3 and it worked well. Let us know if you still face issue.
@jfrylings-twilio Did you tried the later versions of hudi i.e. 0.13.1 or 0.12.3. I tried with Hadoop 3.3.4 and Hudi 0.13.1 and 0.12.3 and it worked well. Let us know if you still face issue.
We will try that once Presto supports those later versions of Hudi. Thanks 👍
I used Hudi0.14.1 on Dataproc2.1(Spark3.3.2 Hadoop3.3.6) to upsert Bloom indexed COW table with PartialUpdateAvroPayload, got same error on reading MDT bloomfilters partition hfiles. Missing some jars or not, how to handle?
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile
at org.apache.hudi.io.storage.HoodieHFileUtils.createHFileReader(HoodieHFileUtils.java:59)
at org.apache.hudi.io.storage.HoodieAvroHFileReader.getHFileReader(HoodieAvroHFileReader.java:290)
at org.apache.hudi.io.storage.HoodieAvroHFileReader.getRecordsByKeysIterator(HoodieAvroHFileReader.java:140)
at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.lookupRecords(HoodieHFileDataBlock.java:205)
at org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecordIterator(HoodieDataBlock.java:154)
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.getRecordsIterator(AbstractHoodieLogRecordReader.java:956)
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processDataBlock(AbstractHoodieLogRecordReader.java:780)
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:825)
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternalV1(AbstractHoodieLogRecordReader.java:403)
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:220)
at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.scanByFullKeys(HoodieMergedLogRecordScanner.java:160)
at org.apache.hudi.metadata.HoodieMetadataLogRecordReader.getRecordsByKeys(HoodieMetadataLogRecordReader.java:108)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.readLogRecords(HoodieBackedTableMetadata.java:327)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupKeysFromFileSlice(HoodieBackedTableMetadata.java:304)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$f9381e22$1(HoodieBackedTableMetadata.java:275)
at org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:38)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:952)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:926)
at java.base/java.util.stream.AbstractTask.compute(AbstractTask.java:327)
at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
I also noticed this issue with Hudi: 0.14.1 Hadoop: 3.2.2 Spark: 3.4.2 Hbase: 2.4.5 Does anyone have a solution?
@danny0405 @ad1happy2go any update on this isssue?
@danny0405 @ad1happy2go any update on this isssue?
Since 1.1 release, the Hbase dependencies got completely removed from the repo, this issue would got addressed.
@abhiNB-star what's your env for this issue?
@danny0405 Hudi version : 0.14.1 SparK 3.1.1 Hadoop version : 3.3.4 Storage (HDFS/S3/GCS..) : GCS
sparkConf: spark.executor.memoryOverhead: 1200M spark.kubernetes.executor.podNamePrefix: testing-party-job-old-hudi spark.serializer: org.apache.spark.serializer.KryoSerializer spark.kryo.registrator: org.apache.spark.HoodieSparkKryoRegistrar spark.executor.extraJavaOptions: "-Dlog4j.configuration=log4j.properties -verbose:class" spark.jars: "gs://dummy/nbdata/resources/jars/test/hbase-client-2.4.17.jar,gs://dummy/nbdata/resources/jars/test/hbase-common-2.4.17.jar,gs://dummy/nbdata/resources/jars/test/hbase-metrics-api-2.4.17.jar,gs://dummy/nbdata/resources/jars/test/hbase-server-2.4.17.jar"
@danny0405 @ad1happy2go
abhi@bastion-host-new:~/testing/new_test/party$ kubectl logs -f -n spark-hood testing-party-job-old-hudi-exec-1 | grep 'HFile' [149.207s][info][class,load] org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [150.231s][info][class,load] org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFileBootstrapIndexWriter source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [150.232s][info][class,load] org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFileBootstrapIndexReader source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [150.232s][info][class,load] org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFilePathForReader source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [150.241s][info][class,load] org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile$CachingBlockReader source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [150.241s][info][class,load] org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile$Reader source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [151.777s][info][class,load] org.apache.hudi.common.table.log.block.HoodieHFileDataBlock source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [151.976s][info][class,load] org.apache.hudi.io.storage.HoodieAvroHFileReader source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [151.985s][info][class,load] org.apache.hudi.io.storage.HoodieAvroHFileReader$$Lambda$1572/0x0000000840c1cc40 source: org.apache.hudi.io.storage.HoodieAvroHFileReader [151.986s][info][class,load] org.apache.hudi.io.storage.HoodieAvroHFileReader$$Lambda$1573/0x0000000840c1c040 source: org.apache.hudi.io.storage.HoodieAvroHFileReader [151.986s][info][class,load] org.apache.hudi.io.storage.HoodieHFileUtils source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [151.988s][info][class,load] org.apache.hudi.io.storage.HoodieAvroHFileReader$SeekableByteArrayInputStream source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [151.989s][info][class,load] org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [151.991s][info][class,load] org.apache.hudi.org.apache.hadoop.hbase.io.hfile.CorruptHFileException source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar at org.apache.hudi.io.storage.HoodieHFileUtils.createHFileReader(HoodieHFileUtils.java:59) at org.apache.hudi.io.storage.HoodieAvroHFileReader.getHFileReader(HoodieAvroHFileReader.java:290) at org.apache.hudi.io.storage.HoodieAvroHFileReader.getRecordsByKeysIterator(HoodieAvroHFileReader.java:140) at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.lookupRecords(HoodieHFileDataBlock.java:205) at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile.<clinit>(HFile.java:176) Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile at org.apache.hudi.io.storage.HoodieHFileUtils.createHFileReader(HoodieHFileUtils.java:59) at org.apache.hudi.io.storage.HoodieAvroHFileReader.getHFileReader(HoodieAvroHFileReader.java:290) at org.apache.hudi.io.storage.HoodieAvroHFileReader.getRecordsByKeysIterator(HoodieAvroHFileReader.java:140) at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.lookupRecords(HoodieHFileDataBlock.java:205) Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile at org.apache.hudi.io.storage.HoodieHFileUtils.createHFileReader(HoodieHFileUtils.java:59) at org.apache.hudi.io.storage.HoodieAvroHFileReader.getHFileReader(HoodieAvroHFileReader.java:290) at org.apache.hudi.io.storage.HoodieAvroHFileReader.getRecordsByKeysIterator(HoodieAvroHFileReader.java:140) at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.lookupRecords(HoodieHFileDataBlock.java:205) java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile at org.apache.hudi.io.storage.HoodieHFileUtils.createHFileReader(HoodieHFileUtils.java:59) at org.apache.hudi.io.storage.HoodieAvroHFileReader.getHFileReader(HoodieAvroHFileReader.java:290) at org.apache.hudi.io.storage.HoodieAvroHFileReader.getRecordsByKeysIterator(HoodieAvroHFileReader.java:140) at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.lookupRecords(HoodieHFileDataBlock.java:205)
Hey any updates on this I am facing the same class not found error for Hfile
I am using Flink version 1.20.1 and Hudi version 1.0.2, any leads would be great