paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Bug] The length of DeletionFile is incorrect

Open suxiaogang223 opened this issue 9 months ago • 3 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Paimon version

0.8-SNAPSHOT

Compute Engine

JavaAPI

Minimal reproduce step

Nothing to do

What doesn't meet your expectations?

I'm trying to support deletion vector for doris' PaimonNativeReader. When I use the offset and length in DeletionFile to read the content of the hdfs file to the local, I got an error when deserializing the content into RoaringBitmap, actually I found that the correct way is to read the content of length + 4 bytes to local. I guess that these 4 bytes are due to saving the serialized length of the DeletionVector when storing DeletionVector to index file.

    static DeletionVector read(FileIO fileIO, DeletionFile deletionFile) throws IOException {
        Path path = new Path(deletionFile.path());
        try (SeekableInputStream input = fileIO.newInputStream(path)) {
            input.seek(deletionFile.offset());
            DataInputStream dis = new DataInputStream(input);
            int actualLength = dis.readInt();
            if (actualLength != deletionFile.length()) {
                throw new RuntimeException(
                        "Size not match, actual size: "
                                + actualLength
                                + ", expert size: "
                                + deletionFile.length()
                                + ", file path: "
                                + path);
            }
            int magicNum = dis.readInt();
            if (magicNum == BitmapDeletionVector.MAGIC_NUMBER) {
                return BitmapDeletionVector.deserializeFromDataInput(dis);
            } else {
                throw new RuntimeException("Invalid magic number: " + magicNum);
            }
        }
    }

Maybe we should add 4 to length or offset in DeletionFile because it's very confusing.

Anything else?

No response

Are you willing to submit a PR?

  • [X] I'm willing to submit a PR!

suxiaogang223 avatar May 09 '24 20:05 suxiaogang223

Oh, it is just by design for a quick check by the size

Zouxxyy avatar May 10 '24 14:05 Zouxxyy

@suxiaogang223 Hi, you can take a look to DeletionFile, the documentation is clear.

JingsongLi avatar May 11 '24 02:05 JingsongLi

Thanks for the reply, as a user I mistakenly thought that the offset and length variables are designed to get the DeletionVector content from hdfs🤓.

suxiaogang223 avatar May 11 '24 07:05 suxiaogang223

Hi @suxiaogang223 , close this issue, feel free to re-open or open a new issue.

JingsongLi avatar May 13 '24 04:05 JingsongLi