hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] NoClassDefFoundError for org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile

Open xccui opened this issue 2 years ago • 6 comments

We occasionally hit the following exception when running a Flink writer job. The job won't self-heal, but can be recovered by manually restarting the TaskManager.

MDT was enabled.

java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile
    at org.apache.hudi.io.storage.HoodieHFileUtils.createHFileReader(HoodieHFileUtils.java:56)
    at org.apache.hudi.io.storage.HoodieAvroHFileReader.<init>(HoodieAvroHFileReader.java:101)
    at org.apache.hudi.io.storage.HoodieAvroFileReaderFactory.newHFileFileReader(HoodieAvroFileReaderFactory.java:35)
    at org.apache.hudi.io.storage.HoodieFileReaderFactory.getFileReader(HoodieFileReaderFactory.java:63)
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.getBaseFileReader(HoodieBackedTableMetadata.java:460)
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.openReaders(HoodieBackedTableMetadata.java:433)
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.getOrCreateReaders(HoodieBackedTableMetadata.java:425)
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$3(HoodieBackedTableMetadata.java:239)
    at java.base/java.util.HashMap.forEach(Unknown Source)
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:237)
    at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:152)
    at org.apache.hudi.metadata.BaseTableMetadata.fetchAllFilesInPartition(BaseTableMetadata.java:339)
    at org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:150)
    at org.apache.hudi.metadata.HoodieMetadataFileSystemView.listPartition(HoodieMetadataFileSystemView.java:69)
    at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$16(AbstractTableFileSystemView.java:428)
    at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Unknown Source)
    at org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:419)
    at org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestMergedFileSlicesBeforeOrOn(AbstractTableFileSystemView.java:854)
    at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:104)
    at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestMergedFileSlicesBeforeOrOn(PriorityBasedFileSystemView.java:195)
    at org.apache.hudi.sink.partitioner.profile.DeltaWriteProfile.smallFilesProfile(DeltaWriteProfile.java:62)
    at org.apache.hudi.sink.partitioner.profile.WriteProfile.getSmallFiles(WriteProfile.java:191)
    at org.apache.hudi.sink.partitioner.BucketAssigner.getSmallFileAssign(BucketAssigner.java:179)
    at org.apache.hudi.sink.partitioner.BucketAssigner.addInsert(BucketAssigner.java:137)
    at org.apache.hudi.sink.partitioner.BucketAssignFunction.getNewRecordLocation(BucketAssignFunction.java:215)
    at org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:194)
    at org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:162)
    at org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83)
    at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:233)
    at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
    at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
    at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:542)
    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:831)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:780)
    at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
    at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:914)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)

Environment Description

  • Hudi version : bdb50ddccc9631317dfb06a06abc38cbd3714ce8

  • Flink version : 1.16.1

  • Hadoop version : 3.3.4

  • Storage (HDFS/S3/GCS..) : S3

xccui avatar Apr 20 '23 04:04 xccui

You have enabled the MDT then?

danny0405 avatar Apr 20 '23 09:04 danny0405

Ah, yes. I forgot MDT was enabled by default in a recent change...

xccui avatar Apr 20 '23 12:04 xccui

I also noticed this issue with Hudi: 0.11.1 Hadoop: 3.3.4 and 3.3.5 Spark: 3.2.1

It does not happen with Hadoop 3.3.1 or 3.3.3. So it looks like the problem occurs starting in Hadoop 3.3.4

jfrylings-twilio avatar May 18 '23 17:05 jfrylings-twilio

@jfrylings-twilio Did you tried the later versions of hudi i.e. 0.13.1 or 0.12.3. I tried with Hadoop 3.3.4 and Hudi 0.13.1 and 0.12.3 and it worked well. Let us know if you still face issue.

ad1happy2go avatar Jul 05 '23 06:07 ad1happy2go

@jfrylings-twilio Did you tried the later versions of hudi i.e. 0.13.1 or 0.12.3. I tried with Hadoop 3.3.4 and Hudi 0.13.1 and 0.12.3 and it worked well. Let us know if you still face issue.

We will try that once Presto supports those later versions of Hudi. Thanks 👍

jfrylings-twilio avatar Jul 05 '23 16:07 jfrylings-twilio

I used Hudi0.14.1 on Dataproc2.1(Spark3.3.2 Hadoop3.3.6) to upsert Bloom indexed COW table with PartialUpdateAvroPayload, got same error on reading MDT bloomfilters partition hfiles. Missing some jars or not, how to handle?

Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile
        at org.apache.hudi.io.storage.HoodieHFileUtils.createHFileReader(HoodieHFileUtils.java:59)
        at org.apache.hudi.io.storage.HoodieAvroHFileReader.getHFileReader(HoodieAvroHFileReader.java:290)
        at org.apache.hudi.io.storage.HoodieAvroHFileReader.getRecordsByKeysIterator(HoodieAvroHFileReader.java:140)
        at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.lookupRecords(HoodieHFileDataBlock.java:205)
        at org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecordIterator(HoodieDataBlock.java:154)
        at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.getRecordsIterator(AbstractHoodieLogRecordReader.java:956)
        at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processDataBlock(AbstractHoodieLogRecordReader.java:780)
        at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:825)
        at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternalV1(AbstractHoodieLogRecordReader.java:403)
        at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:220)
        at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.scanByFullKeys(HoodieMergedLogRecordScanner.java:160)
        at org.apache.hudi.metadata.HoodieMetadataLogRecordReader.getRecordsByKeys(HoodieMetadataLogRecordReader.java:108)
        at org.apache.hudi.metadata.HoodieBackedTableMetadata.readLogRecords(HoodieBackedTableMetadata.java:327)
        at org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupKeysFromFileSlice(HoodieBackedTableMetadata.java:304)
        at org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$f9381e22$1(HoodieBackedTableMetadata.java:275)
        at org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:38)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
        at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:952)
        at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:926)
        at java.base/java.util.stream.AbstractTask.compute(AbstractTask.java:327)
        at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
        at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)

michael1991 avatar Jun 08 '24 14:06 michael1991

I also noticed this issue with Hudi: 0.14.1 Hadoop: 3.2.2 Spark: 3.4.2 Hbase: 2.4.5 Does anyone have a solution?

wardlican avatar Jul 23 '24 02:07 wardlican

@danny0405 @ad1happy2go any update on this isssue?

abhiNB-star avatar Jun 23 '25 10:06 abhiNB-star

@danny0405 @ad1happy2go any update on this isssue?

Since 1.1 release, the Hbase dependencies got completely removed from the repo, this issue would got addressed.

@abhiNB-star what's your env for this issue?

danny0405 avatar Jun 24 '25 01:06 danny0405

@danny0405 Hudi version : 0.14.1 SparK 3.1.1 Hadoop version : 3.3.4 Storage (HDFS/S3/GCS..) : GCS

sparkConf: spark.executor.memoryOverhead: 1200M spark.kubernetes.executor.podNamePrefix: testing-party-job-old-hudi spark.serializer: org.apache.spark.serializer.KryoSerializer spark.kryo.registrator: org.apache.spark.HoodieSparkKryoRegistrar spark.executor.extraJavaOptions: "-Dlog4j.configuration=log4j.properties -verbose:class" spark.jars: "gs://dummy/nbdata/resources/jars/test/hbase-client-2.4.17.jar,gs://dummy/nbdata/resources/jars/test/hbase-common-2.4.17.jar,gs://dummy/nbdata/resources/jars/test/hbase-metrics-api-2.4.17.jar,gs://dummy/nbdata/resources/jars/test/hbase-server-2.4.17.jar"

abhiNB-star avatar Jun 24 '25 05:06 abhiNB-star

@danny0405 @ad1happy2go abhi@bastion-host-new:~/testing/new_test/party$ kubectl logs -f -n spark-hood testing-party-job-old-hudi-exec-1 | grep 'HFile' [149.207s][info][class,load] org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [150.231s][info][class,load] org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFileBootstrapIndexWriter source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [150.232s][info][class,load] org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFileBootstrapIndexReader source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [150.232s][info][class,load] org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFilePathForReader source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [150.241s][info][class,load] org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile$CachingBlockReader source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [150.241s][info][class,load] org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile$Reader source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [151.777s][info][class,load] org.apache.hudi.common.table.log.block.HoodieHFileDataBlock source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [151.976s][info][class,load] org.apache.hudi.io.storage.HoodieAvroHFileReader source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [151.985s][info][class,load] org.apache.hudi.io.storage.HoodieAvroHFileReader$$Lambda$1572/0x0000000840c1cc40 source: org.apache.hudi.io.storage.HoodieAvroHFileReader [151.986s][info][class,load] org.apache.hudi.io.storage.HoodieAvroHFileReader$$Lambda$1573/0x0000000840c1c040 source: org.apache.hudi.io.storage.HoodieAvroHFileReader [151.986s][info][class,load] org.apache.hudi.io.storage.HoodieHFileUtils source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [151.988s][info][class,load] org.apache.hudi.io.storage.HoodieAvroHFileReader$SeekableByteArrayInputStream source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [151.989s][info][class,load] org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar [151.991s][info][class,load] org.apache.hudi.org.apache.hadoop.hbase.io.hfile.CorruptHFileException source: file:/opt/spark/work-dir/./hudi-utilities-bundle_2.12-0.14.1.jar at org.apache.hudi.io.storage.HoodieHFileUtils.createHFileReader(HoodieHFileUtils.java:59) at org.apache.hudi.io.storage.HoodieAvroHFileReader.getHFileReader(HoodieAvroHFileReader.java:290) at org.apache.hudi.io.storage.HoodieAvroHFileReader.getRecordsByKeysIterator(HoodieAvroHFileReader.java:140) at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.lookupRecords(HoodieHFileDataBlock.java:205) at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile.<clinit>(HFile.java:176) Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile at org.apache.hudi.io.storage.HoodieHFileUtils.createHFileReader(HoodieHFileUtils.java:59) at org.apache.hudi.io.storage.HoodieAvroHFileReader.getHFileReader(HoodieAvroHFileReader.java:290) at org.apache.hudi.io.storage.HoodieAvroHFileReader.getRecordsByKeysIterator(HoodieAvroHFileReader.java:140) at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.lookupRecords(HoodieHFileDataBlock.java:205) Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile at org.apache.hudi.io.storage.HoodieHFileUtils.createHFileReader(HoodieHFileUtils.java:59) at org.apache.hudi.io.storage.HoodieAvroHFileReader.getHFileReader(HoodieAvroHFileReader.java:290) at org.apache.hudi.io.storage.HoodieAvroHFileReader.getRecordsByKeysIterator(HoodieAvroHFileReader.java:140) at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.lookupRecords(HoodieHFileDataBlock.java:205) java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile at org.apache.hudi.io.storage.HoodieHFileUtils.createHFileReader(HoodieHFileUtils.java:59) at org.apache.hudi.io.storage.HoodieAvroHFileReader.getHFileReader(HoodieAvroHFileReader.java:290) at org.apache.hudi.io.storage.HoodieAvroHFileReader.getRecordsByKeysIterator(HoodieAvroHFileReader.java:140) at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.lookupRecords(HoodieHFileDataBlock.java:205)

abhiNB-star avatar Jun 24 '25 06:06 abhiNB-star

Hey any updates on this I am facing the same class not found error for Hfile

I am using Flink version 1.20.1 and Hudi version 1.0.2, any leads would be great

MaitreyaManohar avatar Sep 03 '25 17:09 MaitreyaManohar