flatbuffers icon indicating copy to clipboard operation
flatbuffers copied to clipboard

Structure aware fuzzing

Open CasperN opened this issue 2 years ago • 2 comments

@vglavnyy implemented our current fuzzers ^1 which take in a binary blob that's generated with libfuzzer's default mutator.

While this will eventually generate all flatbuffers, the space of binaries is exponentially large, and this might take a while. We should create custom (structure aware) mutators ^2 to help improve fuzzing coverage/efficiency.

The Libfuzzer mutation API is essentially takes in binary and a seed. We need to parse and mutate based on the seed.

There are two use cases:

Generating valid flatbuffers

Our mutator will unpack the given data - returning a sentinel for if that fails - then do random mutation. We need a "reflective object API" to do "random mutation": Reflective so we can choose the fields to mutate randomly. This requires some dynamic typing. "Object API" so we can do things like extend vectors and strings. After mutating, we can pack it back into binary.

Note, for performance we might want the reflective object to be backed by a bump allocator or something.

This is simple and would be useful for flatbuffers users, but it doesn't help with fuzzing our systems which are to be robust to invalid flatbuffers. It also cannot really model unrecognized fields or unreachable data in the buffer.

Generating semi-valid flatbuffers

We should take advantage of @dbaileychess's recent work on binary analysis #7145. It divides the binary into known sections and regions, which are also connected to reflection/schema data. Using the attached schema, we can apply valid transformations such as in-place-mutation of non-offset data, or extending binary sections (and fixing the affected offsets). We can, with a lower probability, do something invalid like mutate an offset or break a vector length, so we can explore the space of invalid buffers that are near the known valid space.

(thanks @aardappel for mentioning fuzzing in context of #7174)

CasperN avatar Mar 18 '22 00:03 CasperN

mutation is incredibly hard in the general case though, possibly better to formulate in terms of regenerating an entire buffer (or making a copy) with 1 change?

aardappel avatar Mar 18 '22 06:03 aardappel

Mutation might be necessary for efficient use of Libfuzzer though. My understanding is that clang generates code that tracks how many times each branch is taken, Libfuzzer tracks a corpus of interesting examples that covers many branches, then Libfuzzer expands the corpus by mutating interesting examples to generate more interesting branches.

formulate in terms of regenerating an entire buffer (or making a copy) with 1 change?

Yeah, that's what I was getting at for generating valid flatbuffers: Read out the given buffer, mutate, and rewrite it. This will be limited to the set of valid flatbuffers with known fields, but that's already very useful to both us and users.

For generating semi-valid flatbuffers... it's not clear to me whether it's strictly necessary to get coverage of every branch of the verifier implementations. Maybe mutating random valid flatbuffers composed with generic mutations on bytes (insert bytes / mutate bytes) would be sufficient for good coverage. In any case, we should probably start with measuring fuzzer coverage.

CasperN avatar Mar 18 '22 17:03 CasperN

This issue is stale because it has been open 6 months with no activity. Please comment or label not-stale, or this will be closed in 14 days.

github-actions[bot] avatar Mar 04 '23 01:03 github-actions[bot]

This issue was automatically closed due to no activity for 6 months plus the 14 day notice period.

github-actions[bot] avatar Mar 18 '23 20:03 github-actions[bot]