CRoaring
CRoaring copied to clipboard
roaring_bitmap_portable_deserialize_safe avoids overflows during deserialization but it does not validate the input
Testcase for attached crash. Build with ASAN & ASAN CRoaring.
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include "roaring/roaring.h"
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size){
roaring_statistics_t stats;
bool answer = true;
roaring_bitmap_t* bitmap = roaring_bitmap_portable_deserialize_safe(data, size);
if(bitmap) {
uint64_t card1 = roaring_bitmap_get_cardinality(bitmap);
roaring_bitmap_statistics(bitmap, &stats);
unsigned universe_size = stats.max_value + 1;
roaring_bitmap_t *inverted = roaring_bitmap_flip(bitmap, 0U, universe_size);
if(inverted) {
roaring_bitmap_t *double_inverted = roaring_bitmap_flip(inverted, 0U, universe_size);
if(double_inverted) {
answer = (roaring_bitmap_get_cardinality(inverted) + roaring_bitmap_get_cardinality(bitmap) == universe_size);
if (answer) answer = roaring_bitmap_equals(bitmap, double_inverted);
if (!answer) {
printf("Bad flip\n\nbitmap1:\n");
roaring_bitmap_printf_describe(bitmap); // debug
printf("\n\nflipped:\n");
roaring_bitmap_printf_describe(inverted); // debug
}
roaring_bitmap_free(double_inverted);
}
roaring_bitmap_free(inverted);
}
roaring_bitmap_free(bitmap);
}
return 0;
}
long filesize(FILE* fp) {
fseek(fp, 0L, SEEK_END);
return ftell(fp);
}
int main(int argc, char **argv)
{
printf("reading %s\n", argv[argc-1]);
FILE* fp = fopen(argv[argc-1], "rb");
long bytes = filesize(fp);
char* buf = malloc(bytes);
if(buf == NULL) return 1;
rewind(fp);
int cnt = fread(buf, 1, bytes, fp);
if(bytes != cnt){
free(buf);
return 1;
}
LLVMFuzzerTestOneInput(buf, cnt);
free(buf);
return 0;
}
reading /tmp/crash-36b07662f33633b92010f875551ad7560f3045b5
=================================================================
==128963==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000232 at pc 0x00000055e5b7 bp 0x7ffe73ea6400 sp 0x7ffe73ea63f8
WRITE of size 2 at 0x602000000232 thread T0
#0 0x55e5b6 in array_container_negation_range (/home/files/git/cr/build/croaring_fuzzer+0x55e5b6)
#1 0x4d0594 in insert_flipped_container roaring.c
#2 0x4ff360 in roaring_bitmap_flip (/home/files/git/cr/build/croaring_fuzzer+0x4ff360)
#3 0x4cbf6a in LLVMFuzzerTestOneInput /home/files/git/cr/build/croaring_fuzzer.c:18:49
#4 0x4cc21f in main /home/files/git/cr/build/croaring_fuzzer.c:57:5
#5 0x7f1941d7cfcf in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#6 0x7f1941d7d07c in __libc_start_main csu/../csu/libc-start.c:409:3
#7 0x41d8b4 in _start (/home/files/git/cr/build/croaring_fuzzer+0x41d8b4)
0x602000000232 is located 0 bytes to the right of 2-byte region [0x602000000230,0x602000000232)
allocated by thread T0 here:
#0 0x49a06d in __interceptor_malloc (/home/files/git/cr/build/croaring_fuzzer+0x49a06d)
#1 0x5298ba in array_container_create_given_capacity (/home/files/git/cr/build/croaring_fuzzer+0x5298ba)
#2 0x55deff in array_container_negation_range (/home/files/git/cr/build/croaring_fuzzer+0x55deff)
#3 0x4d0594 in insert_flipped_container roaring.c
#4 0x4ff360 in roaring_bitmap_flip (/home/files/git/cr/build/croaring_fuzzer+0x4ff360)
#5 0x4cbf6a in LLVMFuzzerTestOneInput /home/files/git/cr/build/croaring_fuzzer.c:18:49
#6 0x4cc21f in main /home/files/git/cr/build/croaring_fuzzer.c:57:5
#7 0x7f1941d7cfcf in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/files/git/cr/build/croaring_fuzzer+0x55e5b6) in array_container_negation_range
Shadow bytes around the buggy address:
0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff8000: fa fa 00 03 fa fa 00 00 fa fa 06 fa fa fa 00 00
0x0c047fff8010: fa fa 04 fa fa fa 00 00 fa fa 04 fa fa fa 00 00
0x0c047fff8020: fa fa 04 fa fa fa 00 00 fa fa fd fd fa fa fd fa
0x0c047fff8030: fa fa fd fd fa fa fd fa fa fa fd fd fa fa fd fa
=>0x0c047fff8040: fa fa 00 00 fa fa[02]fa fa fa fa fa fa fa fa fa
0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8090: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==128963==ABORTING
@K2 Does the data come from a valid bitmap?
The roaring_bitmap_portable_deserialize_safe
function does not validate the content, it only ensures that there is no buffer overflow while deserializing. If you are pointing at garbage, you will end up with trouble later.
Validating the input is more expensive. We do not currently do this tasks.
I will retitle the issue.
The documentation is as follows:
/**
* read a bitmap from a serialized version in a safe manner (reading up to maxbytes).
* This is meant to be compatible with
* the Java and Go versions. See format specification at
* https://github.com/RoaringBitmap/RoaringFormatSpec
* In case of failure, a null pointer is returned.
*/
roaring_bitmap_t *roaring_bitmap_portable_deserialize_safe(const char *buf, size_t maxbytes);
It does not validate the input.
What would make a sensible test case if were to enroll into oss-fuz then?
@K2 So the expectation currently is that you are pointing a data you serialized. You certainly can create adversarial content, as your code demonstrate.
To validate, we need to scan the content of the container, and check that they could have been produced from valid roaring bitmaps:
- The array containers should be sorted.
- The run containers should contain sorted runs that correctly span no more than the 16-bit range.
There might be other constraints that I do not have in mind right now.
We do not do this. It certainly doable, but it is not something we offer right now.
What would make a sensible test case if were to enroll into oss-fuz then?
It is not that it is not valid, it is that we do not support this kind of validation currently. (To be clear, we could do it and I plan to leave this issue open.)
The way the Java and Go versions do fuzzing... is... let me offer pointers...
(Your fuzzing is good, let me be clear.)
Then perhaps to minimize the testroaring_bitmap_portable_deserialize_safe for now and update the test case in the future as we add more validation?
At a glance, the Go deserialization fuzzer checks that there is no panic while deserialization...
https://github.com/RoaringBitmap/roaring/blob/d626fcaa277d89a8f843afeb9ec85fd959d8e6dd/serializationfuzz.go
So basically, you throw anything at roaring_bitmap_portable_deserialize_safe
and it should work (not crash). However, the result might be invalid and useless.
However, the bulk of the fuzzer is based on random operations on both a bitset and a roaring bitmap, and testing that the results agree:
https://github.com/RoaringBitmap/roaring/blob/258dc1c0bd982b05d94dc23563bf55e46a77763e/smat.go
@richardstartin Did something similar in Java... https://github.com/RoaringBitmap/RoaringBitmap/pull/395/files
That is, you take a bitset (or another set data structure) and a roaring bitmap, and then you mutate both in a semantically equivalent manner, then you check that the results are semantically equivalent... and so forth.
Then perhaps to minimize the testroaring_bitmap_portable_deserialize_safe for now and update the test case in the future as we add more validation?
Right. So just testing bitmap_portable_deserialize_safe
with adversarial inputs is excellent.
Consider that we have no fuzzing right now. Whatever you provide is a likely a win.
(It would be nice to have a fully validating deserializer. Arguably, it would be easier to build with a good test framework.)
Does a fully validating deserializer exist today?
It does not! Pull request invited!
(It is not super hard.)
Related to this, just raised this https://github.com/RoaringBitmap/CRoaring/pull/486 to provide also roaring_bitmap_deserialize_safe for the CRoaring format