CRoaring icon indicating copy to clipboard operation
CRoaring copied to clipboard

roaring_bitmap_portable_deserialize_safe avoids overflows during deserialization but it does not validate the input

Open K2 opened this issue 3 years ago • 15 comments

Testcase for attached crash. Build with ASAN & ASAN CRoaring.

crash.zip

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include "roaring/roaring.h"


int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size){
    roaring_statistics_t stats;
    bool answer = true;
    roaring_bitmap_t* bitmap = roaring_bitmap_portable_deserialize_safe(data, size);
    if(bitmap) {
        uint64_t card1 = roaring_bitmap_get_cardinality(bitmap);
        roaring_bitmap_statistics(bitmap, &stats);
        unsigned universe_size = stats.max_value + 1;
        roaring_bitmap_t *inverted = roaring_bitmap_flip(bitmap, 0U, universe_size);
        if(inverted) {
            roaring_bitmap_t *double_inverted = roaring_bitmap_flip(inverted, 0U, universe_size);
            if(double_inverted) {
                answer = (roaring_bitmap_get_cardinality(inverted) + roaring_bitmap_get_cardinality(bitmap) == universe_size);
                if (answer) answer = roaring_bitmap_equals(bitmap, double_inverted);
                if (!answer) {
                    printf("Bad flip\n\nbitmap1:\n");
                    roaring_bitmap_printf_describe(bitmap);  // debug
                    printf("\n\nflipped:\n");
                    roaring_bitmap_printf_describe(inverted);  // debug
                }
                roaring_bitmap_free(double_inverted);
            }
            roaring_bitmap_free(inverted);
        }
        roaring_bitmap_free(bitmap);
    }
    return 0;
}

long filesize(FILE* fp) { 
    fseek(fp, 0L, SEEK_END);
    return ftell(fp);
}

int main(int argc, char **argv)
{
    printf("reading %s\n", argv[argc-1]);
    FILE* fp = fopen(argv[argc-1], "rb");
    long bytes = filesize(fp);
    char* buf = malloc(bytes);
    if(buf == NULL) return 1;

    rewind(fp);

    int cnt = fread(buf, 1, bytes, fp);
    if(bytes != cnt){
        free(buf);
        return 1;
    }
    LLVMFuzzerTestOneInput(buf, cnt);
    free(buf);
    return 0;
}
reading /tmp/crash-36b07662f33633b92010f875551ad7560f3045b5
=================================================================
==128963==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000232 at pc 0x00000055e5b7 bp 0x7ffe73ea6400 sp 0x7ffe73ea63f8
WRITE of size 2 at 0x602000000232 thread T0
    #0 0x55e5b6 in array_container_negation_range (/home/files/git/cr/build/croaring_fuzzer+0x55e5b6)
    #1 0x4d0594 in insert_flipped_container roaring.c
    #2 0x4ff360 in roaring_bitmap_flip (/home/files/git/cr/build/croaring_fuzzer+0x4ff360)
    #3 0x4cbf6a in LLVMFuzzerTestOneInput /home/files/git/cr/build/croaring_fuzzer.c:18:49
    #4 0x4cc21f in main /home/files/git/cr/build/croaring_fuzzer.c:57:5
    #5 0x7f1941d7cfcf in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #6 0x7f1941d7d07c in __libc_start_main csu/../csu/libc-start.c:409:3
    #7 0x41d8b4 in _start (/home/files/git/cr/build/croaring_fuzzer+0x41d8b4)

0x602000000232 is located 0 bytes to the right of 2-byte region [0x602000000230,0x602000000232)
allocated by thread T0 here:
    #0 0x49a06d in __interceptor_malloc (/home/files/git/cr/build/croaring_fuzzer+0x49a06d)
    #1 0x5298ba in array_container_create_given_capacity (/home/files/git/cr/build/croaring_fuzzer+0x5298ba)
    #2 0x55deff in array_container_negation_range (/home/files/git/cr/build/croaring_fuzzer+0x55deff)
    #3 0x4d0594 in insert_flipped_container roaring.c
    #4 0x4ff360 in roaring_bitmap_flip (/home/files/git/cr/build/croaring_fuzzer+0x4ff360)
    #5 0x4cbf6a in LLVMFuzzerTestOneInput /home/files/git/cr/build/croaring_fuzzer.c:18:49
    #6 0x4cc21f in main /home/files/git/cr/build/croaring_fuzzer.c:57:5
    #7 0x7f1941d7cfcf in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/files/git/cr/build/croaring_fuzzer+0x55e5b6) in array_container_negation_range
Shadow bytes around the buggy address:
  0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff8000: fa fa 00 03 fa fa 00 00 fa fa 06 fa fa fa 00 00
  0x0c047fff8010: fa fa 04 fa fa fa 00 00 fa fa 04 fa fa fa 00 00
  0x0c047fff8020: fa fa 04 fa fa fa 00 00 fa fa fd fd fa fa fd fa
  0x0c047fff8030: fa fa fd fd fa fa fd fa fa fa fd fd fa fa fd fa
=>0x0c047fff8040: fa fa 00 00 fa fa[02]fa fa fa fa fa fa fa fa fa
  0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8090: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==128963==ABORTING

K2 avatar Nov 21 '21 22:11 K2

@K2 Does the data come from a valid bitmap?

The roaring_bitmap_portable_deserialize_safe function does not validate the content, it only ensures that there is no buffer overflow while deserializing. If you are pointing at garbage, you will end up with trouble later.

Validating the input is more expensive. We do not currently do this tasks.

I will retitle the issue.

lemire avatar Nov 22 '21 00:11 lemire

The documentation is as follows:

/**
 * read a bitmap from a serialized version in a safe manner (reading up to maxbytes).
 * This is meant to be compatible with
 * the Java and Go versions. See format specification at
 * https://github.com/RoaringBitmap/RoaringFormatSpec
 * In case of failure, a null pointer is returned.
 */
roaring_bitmap_t *roaring_bitmap_portable_deserialize_safe(const char *buf, size_t maxbytes);

It does not validate the input.

lemire avatar Nov 22 '21 00:11 lemire

What would make a sensible test case if were to enroll into oss-fuz then?

K2 avatar Nov 22 '21 00:11 K2

@K2 So the expectation currently is that you are pointing a data you serialized. You certainly can create adversarial content, as your code demonstrate.

To validate, we need to scan the content of the container, and check that they could have been produced from valid roaring bitmaps:

  • The array containers should be sorted.
  • The run containers should contain sorted runs that correctly span no more than the 16-bit range.

There might be other constraints that I do not have in mind right now.

We do not do this. It certainly doable, but it is not something we offer right now.

lemire avatar Nov 22 '21 00:11 lemire

What would make a sensible test case if were to enroll into oss-fuz then?

It is not that it is not valid, it is that we do not support this kind of validation currently. (To be clear, we could do it and I plan to leave this issue open.)

The way the Java and Go versions do fuzzing... is... let me offer pointers...

lemire avatar Nov 22 '21 00:11 lemire

(Your fuzzing is good, let me be clear.)

lemire avatar Nov 22 '21 00:11 lemire

Then perhaps to minimize the testroaring_bitmap_portable_deserialize_safe for now and update the test case in the future as we add more validation?

K2 avatar Nov 22 '21 00:11 K2

At a glance, the Go deserialization fuzzer checks that there is no panic while deserialization...

https://github.com/RoaringBitmap/roaring/blob/d626fcaa277d89a8f843afeb9ec85fd959d8e6dd/serializationfuzz.go

So basically, you throw anything at roaring_bitmap_portable_deserialize_safe and it should work (not crash). However, the result might be invalid and useless.

However, the bulk of the fuzzer is based on random operations on both a bitset and a roaring bitmap, and testing that the results agree:

https://github.com/RoaringBitmap/roaring/blob/258dc1c0bd982b05d94dc23563bf55e46a77763e/smat.go

lemire avatar Nov 22 '21 00:11 lemire

@richardstartin Did something similar in Java... https://github.com/RoaringBitmap/RoaringBitmap/pull/395/files

That is, you take a bitset (or another set data structure) and a roaring bitmap, and then you mutate both in a semantically equivalent manner, then you check that the results are semantically equivalent... and so forth.

lemire avatar Nov 22 '21 00:11 lemire

Then perhaps to minimize the testroaring_bitmap_portable_deserialize_safe for now and update the test case in the future as we add more validation?

Right. So just testing bitmap_portable_deserialize_safe with adversarial inputs is excellent.

lemire avatar Nov 22 '21 00:11 lemire

Consider that we have no fuzzing right now. Whatever you provide is a likely a win.

lemire avatar Nov 22 '21 00:11 lemire

(It would be nice to have a fully validating deserializer. Arguably, it would be easier to build with a good test framework.)

lemire avatar Nov 22 '21 00:11 lemire

Does a fully validating deserializer exist today?

briangreenery avatar May 31 '23 21:05 briangreenery

It does not! Pull request invited!

(It is not super hard.)

lemire avatar May 31 '23 22:05 lemire

Related to this, just raised this https://github.com/RoaringBitmap/CRoaring/pull/486 to provide also roaring_bitmap_deserialize_safe for the CRoaring format

SalvatorePreviti avatar Jun 02 '23 17:06 SalvatorePreviti