roaring icon indicating copy to clipboard operation
roaring copied to clipboard

Implement roaring_bitmap_internal_validate

Open lemire opened this issue 2 years ago • 2 comments

When deserializing a bitmap, it is possible that the result might be invalid. This could happen because there was data corruption. The deserialization could still generate a bitmap without failure, but the result could be otherwise unusable.

You can avoid such problems by hashing your saved data (e.g., md5sum). But we could could also directly, at some expense, validate the deserialized data.

The C version of Roaring has an interesting function that can be called after deserializing a bitmap, to make sure it is proper:

https://github.com/RoaringBitmap/CRoaring/blob/a103d3811702b9389c538881c9974e9a7a7552af/src/roaring.c#L435

     roaring_bitmap_t *t = roaring_bitmap_portable_deserialize_safe(serializedbytes, expectedsize);
     if(t == NULL) { return EXIT_FAILURE; }
     const char *reason = NULL;
     if (!roaring_bitmap_internal_validate(t, &reason)) {
         return EXIT_FAILURE;
     }

It is not very difficult to implement and could help users who have production data.

lemire avatar Sep 26 '23 20:09 lemire

@lemire Is this issue still valid? I wanted to contribute - the good first issue list seems stale https://github.com/RoaringBitmap/roaring/contribute though

bearrito avatar Apr 25 '24 23:04 bearrito

@bearrito Yes. That is a very good first issue.

The C code is well tested and should be similar to what is needed in Go:

https://github.com/RoaringBitmap/CRoaring/blob/a103d3811702b9389c538881c9974e9a7a7552af/src/roaring.c#L435

It would be highly valued by some users.

lemire avatar Apr 26 '24 00:04 lemire

This can be closed

bearrito avatar Jun 12 '24 00:06 bearrito

closed

lemire avatar Jun 12 '24 14:06 lemire