paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Bug] java.lang.NoSuchMethodError: org.apache.parquet.hadoop.ParquetFileReader.<init> when running BranchSqlITCase in IDEA

Open liming30 opened this issue 1 year ago • 8 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Paimon version

paimon-1.0-SNAPSHOT

Compute Engine

flink

Minimal reproduce step

Run BranchSqlITCase in IDEA.

What doesn't meet your expectations?

doop.ParquetFileReader.<init>(Lorg/apache/parquet/io/InputFile;Lorg/apache/parquet/ParquetReadOptions;Lorg/apache/paimon/fileindex/FileIndexResult;)V
	at org.apache.paimon.format.parquet.ParquetUtil.getParquetReader(ParquetUtil.java:85)
	at org.apache.paimon.format.parquet.ParquetUtil.extractColumnStats(ParquetUtil.java:52)
	at org.apache.paimon.format.parquet.ParquetSimpleStatsExtractor.extractWithFileInfo(ParquetSimpleStatsExtractor.java:78)
	at org.apache.paimon.format.parquet.ParquetSimpleStatsExtractor.extract(ParquetSimpleStatsExtractor.java:71)
	at org.apache.paimon.io.StatsCollectingSingleFileWriter.fieldStats(StatsCollectingSingleFileWriter.java:88)
	at org.apache.paimon.io.RowDataFileWriter.result(RowDataFileWriter.java:109)
	at org.apache.paimon.io.RowDataFileWriter.result(RowDataFileWriter.java:48)
	at org.apache.paimon.io.RollingFileWriter.closeCurrentWriter(RollingFileWriter.java:136)
	at org.apache.paimon.io.RollingFileWriter.close(RollingFileWriter.java:168)
	at org.apache.paimon.append.AppendOnlyWriter$DirectSinkWriter.flush(AppendOnlyWriter.java:418)
	at org.apache.paimon.append.AppendOnlyWriter.flush(AppendOnlyWriter.java:219)
	at org.apache.paimon.append.AppendOnlyWriter.prepareCommit(AppendOnlyWriter.java:207)
	at org.apache.paimon.operation.AbstractFileStoreWrite.prepareCommit(AbstractFileStoreWrite.java:210)
	at org.apache.paimon.operation.MemoryFileStoreWrite.prepareCommit(MemoryFileStoreWrite.java:154)
	at org.apache.paimon.table.sink.TableWriteImpl.prepareCommit(TableWriteImpl.java:253)
	at org.apache.paimon.flink.sink.StoreSinkWriteImpl.prepareCommit(StoreSinkWriteImpl.java:229)
	at org.apache.paimon.flink.sink.TableWriteOperator.prepareCommit(TableWriteOperator.java:123)
	at org.apache.paimon.flink.sink.RowDataStoreWriteOperator.prepareCommit(RowDataStoreWriteOperator.java:189)
	at org.apache.paimon.flink.sink.PrepareCommitOperator.emitCommittables(PrepareCommitOperator.java:100)
	at org.apache.paimon.flink.sink.PrepareCommitOperator.endInput(PrepareCommitOperator.java:88)

Anything else?

No response

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

liming30 avatar Nov 13 '24 07:11 liming30

We have encountered many times such as NoSuchMethodError and ClassNotFoundException when executing tests in IDEA. This is mainly because we overwrote some format classes, shade packages, etc.

Should we create a separate paimon-shaded project for these shade packages and depend on the shade package in paimon? @JingsongLi @Zouxxyy WDYT?

liming30 avatar Nov 13 '24 07:11 liming30

Can't agree more, have been tortured by this for a long time

Zouxxyy avatar Nov 13 '24 08:11 Zouxxyy

I pushed a fix for this in the first commit of #4520, may be this can be solved

Aitozi avatar Nov 13 '24 10:11 Aitozi

This is mainly due to we override the parquet's class ParquetFileReader with our own version. If we have a paimon-shade project, we have to put paimon's ParquetFileReader in it ?

Aitozi avatar Nov 13 '24 10:11 Aitozi

This is mainly due to we override the parquet's class ParquetFileReader with our own version. If we have a paimon-shade project, we have to put paimon's ParquetFileReader in it ?

Yes, we should put all the needed classes in it.

liming30 avatar Nov 13 '24 10:11 liming30

I also encountered it, before the paimon-shade, we can solve it in this way for the time being. image

Tan-JiaLiang avatar Nov 14 '24 03:11 Tan-JiaLiang

I also encountered it, before the paimon-shade, we can solve it in this way for the time being. image

Run org.apache.paimon.spark.sql.DDLWithHiveCatalogTestBase also has similar error,I make paimon-format higher than parquet,it still has this error.

java.lang.BootstrapMethodError: java.lang.NoSuchMethodError: org.apache.parquet.hadoop.ParquetWriter$Builder.withBloomFilterFPP(Ljava/lang/String;D)Lorg/apache/parquet/hadoop/ParquetWriter$Builder;
	at org.apache.paimon.format.parquet.writer.RowDataParquetBuilder.createWriter(RowDataParquetBuilder.java:95)
	at org.apache.paimon.format.parquet.ParquetWriterFactory.create(ParquetWriterFactory.java:52)
	at org.apache.paimon.io.SingleFileWriter.<init>(SingleFileWriter.java:74)
	at org.apache.paimon.io.StatsCollectingSingleFileWriter.<init>(StatsCollectingSingleFileWriter.java:58)
	at org.apache.paimon.io.RowDataFileWriter.<init>(RowDataFileWriter.java:70)
	at org.apache.paimon.io.RowDataRollingFileWriter.lambda$new$0(RowDataRollingFileWriter.java:59)
	at org.apache.paimon.io.RollingFileWriter.openCurrentWriter(RollingFileWriter.java:123)
	at org.apache.paimon.io.RollingFileWriter.write(RollingFileWriter.java:78)
	at org.apache.paimon.append.AppendOnlyWriter$DirectSinkWriter.write(AppendOnlyWriter.java:403)
	at org.apache.paimon.append.AppendOnlyWriter.write(AppendOnlyWriter.java:161)
	at org.apache.paimon.append.AppendOnlyWriter.write(AppendOnlyWriter.java:66)
	at org.apache.paimon.operation.AbstractFileStoreWrite.write(AbstractFileStoreWrite.java:150)
	at org.apache.paimon.table.sink.TableWriteImpl.writeAndReturn(TableWriteImpl.java:175)
	at org.apache.paimon.table.sink.TableWriteImpl.write(TableWriteImpl.java:147)
	at org.apache.paimon.spark.SparkTableWrite.write(SparkTableWrite.scala:40)
	at org.apache.paimon.spark.commands.PaimonSparkWriter.$anonfun$write$2(PaimonSparkWriter.scala:94)
	at org.apache.paimon.spark.commands.PaimonSparkWriter.$anonfun$write$2$adapted(PaimonSparkWriter.scala:94)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at org.apache.paimon.spark.commands.PaimonSparkWriter.$anonfun$write$1(PaimonSparkWriter.scala:94)
	at org.apache.spark.sql.execution.MapPartitionsExec.$anonfun$doExecute$3(objects.scala:201)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

herefree avatar Nov 15 '24 13:11 herefree

We have encountered many times such as NoSuchMethodError and ClassNotFoundException when executing tests in IDEA. This is mainly because we overwrote some format classes, shade packages, etc.

Should we create a separate paimon-shaded project for these shade packages and depend on the shade package in paimon? @JingsongLi @Zouxxyy WDYT?

+1 for a dedicated repo to hold the shade format classes, we have encountered several times when running test in IDE

Aitozi avatar Feb 28 '25 06:02 Aitozi