paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Bug] file format is orc use zstd occur: ZstdException: Data corruption detected

Open jerry-024 opened this issue 11 months ago • 0 comments

Search before asking

  • [x] I searched in the issues and found nothing similar.

Paimon version

detail:

Caused by: com.github.luben.zstd.ZstdException: Data corruption detected
	at com.github.luben.zstd.ZstdDecompressCtx.decompressByteArray(ZstdDecompressCtx.java:205)
	at com.github.luben.zstd.Zstd.decompressByteArray(Zstd.java:439)
	at org.apache.paimon.shade.org.apache.orc.impl.ZstdCodec.decompress(ZstdCodec.java:259)
	at org.apache.paimon.shade.org.apache.orc.impl.InStream$CompressedStream.readHeader(InStream.java:521)
	at org.apache.paimon.shade.org.apache.orc.impl.InStream$CompressedStream.ensureUncompressed(InStream.java:548)
	at org.apache.paimon.shade.org.apache.orc.impl.InStream$CompressedStream.read(InStream.java:535)
	at org.apache.paimon.shade.org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:2060)
	at org.apache.paimon.shade.org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:2079)
	at org.apache.paimon.shade.org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:2177)
	at org.apache.paimon.shade.org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:2009)
	at org.apache.paimon.shade.org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2633)
	at org.apache.paimon.shade.org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65)
	at org.apache.paimon.shade.org.apache.orc.impl.reader.tree.StructBatchReader.nextBatchForLevel(StructBatchReader.java:100)
	at org.apache.paimon.shade.org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:77)
	at org.apache.paimon.shade.org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1579)
	at org.apache.paimon.format.orc.OrcReaderFactory.nextBatch(OrcReaderFactory.java:322)
	at org.apache.paimon.format.orc.OrcReaderFactory.access$100(OrcReaderFactory.java:66)
	at org.apache.paimon.format.orc.OrcReaderFactory$OrcVectorizedReader.readBatch(OrcReaderFactory.java:235)
	at org.apache.paimon.format.orc.OrcReaderFactory$OrcVectorizedReader.readBatch(OrcReaderFactory.java:217)
	at org.apache.paimon.reader.RecordReaderIterator.<init>(RecordReaderIterator.java:37)

Compute Engine

Flink

Minimal reproduce step

Current we don't know how reproduce this problem. Read the file find the problem is in header.

What doesn't meet your expectations?

If anyone meet this problem could give more context.

Anything else?

No response

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

jerry-024 avatar Feb 10 '25 05:02 jerry-024