paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Bug] s3://paas-flink-prod/.../bucket-0/data-5da975ee-318e-4ba4-b3f7-ad112dae5247-0.parquet is not a Parquet file. Expected magic number at tail, but found [21, 0, 21, -32]

Open logicbaby opened this issue 11 months ago • 0 comments

Search before asking

  • [x] I searched in the issues and found nothing similar.

Paimon version

paimon-flink-1.20-1.0.1.jar paimon-s3-1.0.1.jar paimon-flink-action-1.0.1.jar

Compute Engine

flink-1.20.0

Minimal reproduce step

Use mysql cdc sync table to paimon table which on s3. it cannot complet checkpoint, taskmanager report:

Caused by: java.lang.RuntimeException: s3://paas-flink-prod/flink-paimon/wh/chen.db/department/bucket-0/data-65dbb220-7017-468d-affb-1de9dd6e4105-0.parquet is not a Parquet file. Expected magic number at tail, but found [21, 0, 21, -32]
	at org.apache.paimon.shade.org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:162) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.shade.org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:243) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.format.parquet.ParquetUtil.getParquetReader(ParquetUtil.java:85) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.format.parquet.ParquetUtil.extractColumnStats(ParquetUtil.java:52) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.format.parquet.ParquetSimpleStatsExtractor.extractWithFileInfo(ParquetSimpleStatsExtractor.java:78) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.format.parquet.ParquetSimpleStatsExtractor.extract(ParquetSimpleStatsExtractor.java:71) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.io.StatsCollectingSingleFileWriter.fieldStats(StatsCollectingSingleFileWriter.java:105) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.io.KeyValueDataFileWriter.result(KeyValueDataFileWriter.java:169) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.io.KeyValueDataFileWriter.result(KeyValueDataFileWriter.java:58) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.io.RollingFileWriter.closeCurrentWriter(RollingFileWriter.java:135) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.io.RollingFileWriter.close(RollingFileWriter.java:167) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.mergetree.MergeTreeWriter.flushWriteBuffer(MergeTreeWriter.java:235) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.mergetree.MergeTreeWriter.prepareCommit(MergeTreeWriter.java:264) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]

I have downloaded this parquet and checked it is ok.

cdc params:

local:///opt/flink/usrlib/paimon-flink-action-1.0.1.jar
mysql_sync_table
--warehouse s3://paas-flink-prod/flink-paimon/wh
--database chen
--table department
--mysql_conf hostname=rm-xxx.mysql.rds.aliyuncs.com
--mysql_conf username=**
--mysql_conf password='**'
--mysql_conf database-name='xxx'
--mysql_conf table-name='department'

What doesn't meet your expectations?

it's cannot use s3 as paimon warehouse backend storage, hdfs is ok.

Anything else?

No response

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

logicbaby avatar Feb 13 '25 14:02 logicbaby