trezor-firmware Run fuzzer in CI

This PR adds ClusterfuzzLite which runs the crypto fuzzer when PRs are made.

ClusterfuzzLite can be extended beyond this PR with code coverage and batch fuzzing: https://google.github.io/clusterfuzzlite/running-clusterfuzzlite/github-actions/

Signed-off-by: AdamKorcz [email protected]

Jul 28 '22 14:07 AdamKorcz

the PR does not make this at all obvious: on which parts of the repo would the fuzzer be running?

Aug 08 '22 09:08 matejcik

I don't understand how this works - for how long does the CI do the fuzzing before ACKing/NACKing the PR. Does fuzzing even make sense unless you fuzz for several hours/days? What's your take on this @invd?

Also, our code is already fuzzed via the OSS-FUZZ project, see the following files and grep for trezor:

https://github.com/google/oss-fuzz/blob/master/projects/bitcoin-core/build_cryptofuzz.sh
https://github.com/google/oss-fuzz/blob/master/projects/cryptofuzz/build.sh

Aug 08 '22 10:08 prusnak

@matejcik : from what I can see, this GitHub Actions workflow builds and runs the crypto/fuzzer/fuzzer.c in more or less default configuration (with ASan), so the testing will cover selected functions from the C /crypto/ library of the repo.

@prusnak you're raising an important question. The short answer is that for this particular fuzzer that works at the level of crypto functions, previous experiences from finding code regressions suggest that there is usefulness to running it for a few minutes to hours for bug detection. The details depend on the specifications of the GitHub runner (multicore performance and settings), whether or not some persistent copy of the produced corpus of last fuzzer runs is re-used for new fuzzer runs, if the issue is detectable with the selected error detection methods (ASan, UBSan, ..). Concrete steps to move in this direction include shipping a prepared minimal corpus as an initial starting base and using some compile-time optimization flags, custom fuzzer dictionaries or additional sanitizers.

The cryptofuzz project by Guido Vranken performs different tests, so this work is not redundant.

Aug 08 '22 10:08 invd

There is a recent example where a per-PR crypto fuzzing sanity check run would have helped to spot issues before the merge, see https://github.com/trezor/trezor-firmware/pull/2365.

Note that the current proposed workflow settings include fuzz-seconds: 400, which is roughly 6.5 minutes. Finding a good trade-off for "synchronous" runs that people are waiting for is not easy, so we can alternatively consider running fuzzer CI jobs for a longer duration in a more "async" setting, for example a nightly run that warns of any issues after the merge. Both approaches have their pros and cons.

By my understanding, the results of the Github Actions based fuzz testing runs are immediately public. This may be less of an issue for newly introduced bugs in unmerged public PRs, but gets relevant if the testing reveals more issues in shipped production code. Arguably, if the fuzz testing code is public, anyone can run it privately and find the issues as well, but it's still something to consider with regards to debugging and disclosure workflows.

Aug 08 '22 10:08 invd

Thank you for the input @invd. To add to your comments: ClusterfuzzLite aims to build up corpus over time, so the fuzzers don't start from scratch on every run. In addition, batch fuzzing can be set up which runs the fuzzers periodically to look for harder-to-find bugs and build up the corpus to be used for CI fuzzing as wel.

Aug 08 '22 15:08 AdamKorcz

@invd I can't assign the review to you, but please do a review of this PR

@AdamKorcz at this time the job is failing because make fuzzer target does not seem to exist?

Aug 22 '22 12:08 matejcik

@matejcik @AdamKorcz

at this time the job is failing because make fuzzer target does not seem to exist?

I've debugged this a bit locally and think that COPY . $SRC/ is not working as expected when run from within the repository target, since it's not putting the trezor-firmware contents under /src/trezor-firmware/ but under /src/ instead. As a result, the /src/trezor-firmware/crypto folder exists due to the ẀORKSPACE command but is empty. In this state, make gives an error that suggests only the named fuzzer target doesn't exist when actually the whole Makefile and code aren't there. I propose testing COPY . $SRC/trezor-firmware as a fix.

Aug 24 '22 13:08 invd

@AdamKorcz : do you have some time in the next days to communicate on and address some of the discussed topics, such as https://github.com/trezor/trezor-firmware/pull/2420#issuecomment-1225731556 , https://github.com/trezor/trezor-firmware/pull/2420#discussion_r953820560, https://github.com/trezor/trezor-firmware/pull/2420#discussion_r953727170 and https://github.com/trezor/trezor-firmware/pull/2420#discussion_r953823170 ? Alternatively, we could go forward with some of those changes on our own.

Sep 12 '22 14:09 invd

@AdamKorcz : do you have some time in the next days to communicate on and address some of the discussed topics, such as #2420 (comment) , #2420 (comment), #2420 (comment) and #2420 (comment) ? Alternatively, we could go forward with some of those changes on our own.

I am out of bandwidth this week and next, but based on the discussion it looks like you @invd can do the fixes easily as they look like nits. If you have managed to test it locally, you are in a good stage. So your suggestion, "Alternatively, we could go forward with some of those changes on our own." sounds good to me.

Sep 14 '22 19:09 AdamKorcz

@AdamKorcz I'd like to ask you if you have time to finish this. Thanks

Apr 14 '23 11:04 Hannsek

rebased on current master and applied some of the changes @invd is proposing

@invd and @AdamKorcz please take a look. the fuzzer seems to be running, I am basically OK with merging as-is but maybe there are some cheap wins still to be had?

Jun 26 '23 09:06 matejcik

@matejcik : I'll take a closer look later this week.

Jul 04 '23 10:07 invd

Overall, I'm for merging this and then following up later with improvements in a new PR.

The most relevant missing improvement is to let the fuzzer start with an existing corpus instead of making it re-learn everything on each run. From what I can see, the proper and automated way to do this is via batch mode + persistent storage. The PR only uses code-change mode at the moment, so this doesn't happen yet. (Example) It's possible to manually put a corpus archive into the repository or download it from some external source as a workaround, but this also adds maintenance overhead and tends to clutter the repository.

The fuzzer in question exclusively covers code in the crypto folder (former trezor-crypto), which sees only infrequent changes - in other words, 99% of daily commits to trezor-firmware won't lead to observable changes. Merging this PR will help gather some experience if the code-change mode is a good fit, or if we want to switch to batch mode exclusively for this fuzzer.

Other open improvements:

clusterfuzz parallel-fuzzing mode to make use of the second CPU core.
Use more sanitizers, namely undefined and memory.
Include advanced fuzzer techniques, such as an automatically generated or uploaded dictionary file.
Tweak compilation details.

Jul 11 '23 11:07 invd

thanks for your comments @invd, i'm going ahead and merging this.

we would accept contributions to make the fuzzing configuration better, but don't currently have the resources to do this from our side

Jul 11 '23 11:07 matejcik

trezor-firmware trezor-firmware copied to clipboard

Run fuzzer in CI

trezor-firmware
trezor-firmware copied to clipboard