cargo-fuzz
cargo-fuzz copied to clipboard
Figure out a solution for the stdlib
Libstd isn't built with sanitizer support. Perhaps we should release a version that is.
https://github.com/japaric/rust-san
It is likely that instrumenting libstd itself is counterproductive. Instrumenting libstd and all dependent libraries will increase the workload of sanitizer greatly and may take longer than necessary to yield useful results.
Instead we should instrument the target crate only by default and provide options to increase fuzzing scope (instrumenting dependencies, instrumenting libstd, etc.)
idk, lots of panics originate from the stdlib (indexing, etc). While it's true that the branches from these usually get inlined (even though the panic code doesn't), which should be enough for fuzzing, it would be nice to have an instrumented stdlib.
rust-san has instructions for setting up an instrumented stdlib with xargo, we could probably just provide links to that and perhaps add a cargo fuzz command for setting that up. You're right that this doesn't have to be the default.
Some clarification to my comment above:
libstd
does not need to be instrumented for panics to be understood by the fuzzer. With that out of the way…
By only instrumenting the crate of interest, what is essentially done is search for bugs within the crate of interest alone (± some inlined code). By instrumenting the dependencies (incl. libstd
) as well, fuzzer now effectively searches for bugs within the dependencies at the same time.
This may sound desirable, but it is not. The dependencies can be fuzzed on their own. Fuzzing the crates separately both reduces the scope of fuzzing greatly (i.e. fuzzing is faster to find bugs, and if no bugs are found, there’s more confidence about crate being correct for the same investment of time), but also verifies the correctness of the dependency crate independently, which is strictly better than checking correctness of dependency within context of your own crate.
So it seems to me that fuzzing crates separately and in isolation is ideal. I recognise that not everybody wants to spend effort to fuzz their dependencies and that fuzzing everything at once is less effort – that’s where the options to increase the scope come in.
libstd does not need to be instrumented for panics to be understood by the fuzzer. With that out of the way…
Well, yeah, but the branches in libstd will be opaque to the fuzzer, no?
Interesting point about fuzzing dependencies. We're currently using RUSTFLAGS but an option to use cargo rustc would be nice.
Well, yeah, but the branches in libstd will be opaque to the fuzzer, no?
I suspect this is why I can't usefully fuzz the vobsub
crate, which does lots of tricky, low-level parsing. The very first thing I do with the input bytes is search for an MPEG-2 Program Stream header:
// Search for the start of a ProgramStream packet.
let needle = &[0x00, 0x00, 0x01, 0xba];
let start = self.remaining.windows(needle.len())
.position(|window| needle == window);
When I fuzz this, cargo fuzz
appears to be endlessly fuzzing single-byte inputs:
INFO: Seed: 577350870
INFO: Loaded 0 modules (0 guards):
Loading corpus dir: corpus
INFO: -max_len is not provided, using 64
INFO: A corpus is not provided, starting from an empty corpus
#0 READ units: 1
#1 INITED cov: 63 corp: 1/1b exec/s: 0 rss: 70Mb
#1048576 pulse cov: 63 corp: 1/1b exec/s: 524288 rss: 136Mb
#2097152 pulse cov: 63 corp: 1/1b exec/s: 419430 rss: 194Mb
#4194304 pulse cov: 63 corp: 1/1b exec/s: 419430 rss: 311Mb
#8388608 pulse cov: 63 corp: 1/1b exec/s: 399457 rss: 544Mb
#16777216 pulse cov: 63 corp: 1/1b exec/s: 409200 rss: 746Mb
#33554432 pulse cov: 63 corp: 1/1b exec/s: 419430 rss: 747Mb
#67108864 pulse cov: 63 corp: 1/1b exec/s: 422068 rss: 747Mb
I'm going to try reading the "san" docs to see if I can rebuild std
, and see if that helps any.
OK, there's some useful information in the Reddit thread announcing cargo fuzz
. This links to Better backtraces, which explains how to rebuild rust-src
using xargo
. But I'm not quite sure how to make this work under cargo fuzz
without modifying cargo fuzz
itself. I'll try poking at this later if I have time.
I'd really like to fuzz vobsub
; it's parsing all sorts of ancient, ugly MPEG-2 data, and I would love to provide it as part of a web service someday.
@emk consider providing at least one starter file to the fuzzer (put into fuzz/corpus).
Well, yeah, but the branches in libstd will be opaque to the fuzzer, no?
That’s only for functions monomorphised within libstd. They are few and far between.
@nagisa Thank you for your advice! It turned out I had two problems:
- The MPEG2 packet formats are complex enough that I really did need to start with a valid input to get good results.
- My driver was accidentally ignoring certain important code paths.
Now I've found 5 issues and I'm looking for more (I'll submit a PR for the trophy case shortly). These kinds of low-level binary decoders are excellent hunting grounds for bugs, even in safe mode.
I've provided documentation on how to fuzz-test vobsub
here, in case anybody wants to experiment with it. Thank you for your help and for a great tool!
Triage: documentation issue, excerpt from above:
rust-san has instructions for setting up an instrumented stdlib with xargo, we could probably just provide links to that and perhaps add a cargo fuzz command for setting that up. You're right that this doesn't have to be the default.
I think this is kinda solved in nightly? https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std
Yeah, you can also use xargo on stable. I'd accept PRs adding support for a build-std mode
FWIW this is strictly required for Memory Sanitizer to work. Right now enabling msan in cargo-fuzz makes any binary immediately abort with a MSAN error. This is documented in more detail here: https://doc.rust-lang.org/nightly/unstable-book/compiler-flags/sanitizer.html#memorysanitizer
A recipe for MSAN build that also enables MSAN for C dependencies that are linked into the binary can be found here: https://github.com/rust-lang/rust/issues/39610#issuecomment-573391197
This is now as simple as cargo fuzz run target_name -Z build-std
Also, #233 automatically enables this option if Memory Sanitizer is enabled.
Should fuzz targets always use -Zbuild-std
? The standard library has a lot of debug assertions and I'm sure there are Rust-level UB that the sanitizers won't understand but which could be caught by having debug_assertions
turned on in the standard library.