paimon
paimon copied to clipboard
[Bug] The length of DeletionFile is incorrect
Search before asking
- [X] I searched in the issues and found nothing similar.
Paimon version
0.8-SNAPSHOT
Compute Engine
JavaAPI
Minimal reproduce step
Nothing to do
What doesn't meet your expectations?
I'm trying to support deletion vector for doris' PaimonNativeReader. When I use the offset and length in DeletionFile to read the content of the hdfs file to the local, I got an error when deserializing the content into RoaringBitmap, actually I found that the correct way is to read the content of length + 4 bytes to local. I guess that these 4 bytes are due to saving the serialized length of the DeletionVector when storing DeletionVector to index file.
static DeletionVector read(FileIO fileIO, DeletionFile deletionFile) throws IOException {
Path path = new Path(deletionFile.path());
try (SeekableInputStream input = fileIO.newInputStream(path)) {
input.seek(deletionFile.offset());
DataInputStream dis = new DataInputStream(input);
int actualLength = dis.readInt();
if (actualLength != deletionFile.length()) {
throw new RuntimeException(
"Size not match, actual size: "
+ actualLength
+ ", expert size: "
+ deletionFile.length()
+ ", file path: "
+ path);
}
int magicNum = dis.readInt();
if (magicNum == BitmapDeletionVector.MAGIC_NUMBER) {
return BitmapDeletionVector.deserializeFromDataInput(dis);
} else {
throw new RuntimeException("Invalid magic number: " + magicNum);
}
}
}
Maybe we should add 4 to length or offset in DeletionFile because it's very confusing.
Anything else?
No response
Are you willing to submit a PR?
- [X] I'm willing to submit a PR!
Oh, it is just by design for a quick check by the size
@suxiaogang223 Hi, you can take a look to DeletionFile
, the documentation is clear.
Thanks for the reply, as a user I mistakenly thought that the offset and length variables are designed to get the DeletionVector content from hdfs🤓.
Hi @suxiaogang223 , close this issue, feel free to re-open or open a new issue.