Add fuzzer for `roaring_buffer_reader`
This PR adds a fuzzer based on libfuzzer for roaring_buffer_reader. You can try it as follows (the fuzzer will run up to 60s):
cd fuzz
make quick_fuzz
The intent of the fuzzer is to cover whether the custom deserialization and operations on roaring bitmaps implemented in roaring_buffer_reader.c is safe and correct, in the sense that they compute the same results as croaring.
The fuzzer works as follows:
- Deserializes the input with
roaring_bitmap_portable_deserialize_safe. Stop if deserialization fails. - Create a roaring buffer with
roaring_buffer_create. - Compare return values for the following functions with their equivalent from
croaring:-
roaring_buffer_get_cardinality -
roaring_buffer_contains -
roaring_buffer_is_subset -
roaring_buffer_and -
roaring_buffer_andnot -
roaring_buffer_and_cardinality -
roaring_buffer_or_cardinality -
roaring_buffer_andnot_cardinality -
roaring_buffer_xor_cardinality -
roaring_buffer_jaccard_index -
roaring_buffer_intersect -
roaring_buffer_is_empty -
roaring_buffer_equals -
roaring_buffer_rank -
roaring_buffer_minimum -
roaring_buffer_maximum
-
If you run the fuzzer, it pretty quickly finds that roaring_buffer_t relies on the offset header from the serialized bitmap, which is not validated by roaring_bitmap_portable_deserialize_safe. This can cause the roaring buffer functions to return wrong results.
Here's an example of that found by the fuzzer replicated in psql:
postgres=# select rb_to_array('\x3a300000010000000000000000000000000000000000000000000000000000000000000000000000000000000008'::roaringbitmap);
rb_to_array
-------------
{0}
(1 row)
postgres=# select rb_min('\x3a300000010000000000000000000000000000000000000000000000000000000000000000000000000000000008'::roaringbitmap);
rb_min
--------
12346
(1 row)
A proposed fix could be to not rely on the offset reader for untrusted input, but rather parse the containers instead (the croaring library always does this).
If there's support for this change, as well as #40, then we can set up fuzzing as a Github Action (see https://google.github.io/clusterfuzzlite/running-clusterfuzzlite/github-actions/).
How about rename Makefile.fuzz to Makefile, while it's the only build file in the directory
How about rename Makefile.fuzz to Makefile, while it's the only build file in the directory
Makes sense — fixed in 8f5cf6c 🙂
As described in #45, the roaring_buffer_reader relies on offsets for deserialization. When the offsets in the input serialized data are incorrect, it cannot produce the same output as roaring_bitmap_portable_deserialize_safe(which might skip offset parsing).
If it were also modified to not rely on offsets for deserialization, there would be a significant performance degradation, so this discrepancy can only be ignored.
After including the fix from #45, as well as a call to roaring_bitmap_internal_validate, then the fuzzer can run for 10 minutes with no findings 🥳
Note that the issue around offsets also apply for cardinalities: roaring_buffer_*-functions rely on keyscards from the serialized buffer, but malicious input can exploit this. The change in #45 fixes this too though 🙂
I'll open a separate PR to add roaring_bitmap_internal_validate in roaringbitmap_in and rb_from_bytea.