cargo-fuzz icon indicating copy to clipboard operation
cargo-fuzz copied to clipboard

Figure out a solution for the stdlib

Open Manishearth opened this issue 7 years ago • 15 comments

Libstd isn't built with sanitizer support. Perhaps we should release a version that is.

Manishearth avatar Feb 21 '17 06:02 Manishearth

https://github.com/japaric/rust-san

Manishearth avatar Feb 21 '17 15:02 Manishearth

It is likely that instrumenting libstd itself is counterproductive. Instrumenting libstd and all dependent libraries will increase the workload of sanitizer greatly and may take longer than necessary to yield useful results.

Instead we should instrument the target crate only by default and provide options to increase fuzzing scope (instrumenting dependencies, instrumenting libstd, etc.)

nagisa avatar Feb 21 '17 16:02 nagisa

idk, lots of panics originate from the stdlib (indexing, etc). While it's true that the branches from these usually get inlined (even though the panic code doesn't), which should be enough for fuzzing, it would be nice to have an instrumented stdlib.

rust-san has instructions for setting up an instrumented stdlib with xargo, we could probably just provide links to that and perhaps add a cargo fuzz command for setting that up. You're right that this doesn't have to be the default.

Manishearth avatar Feb 21 '17 17:02 Manishearth

Some clarification to my comment above:

libstd does not need to be instrumented for panics to be understood by the fuzzer. With that out of the way…

By only instrumenting the crate of interest, what is essentially done is search for bugs within the crate of interest alone (± some inlined code). By instrumenting the dependencies (incl. libstd) as well, fuzzer now effectively searches for bugs within the dependencies at the same time.

This may sound desirable, but it is not. The dependencies can be fuzzed on their own. Fuzzing the crates separately both reduces the scope of fuzzing greatly (i.e. fuzzing is faster to find bugs, and if no bugs are found, there’s more confidence about crate being correct for the same investment of time), but also verifies the correctness of the dependency crate independently, which is strictly better than checking correctness of dependency within context of your own crate.

So it seems to me that fuzzing crates separately and in isolation is ideal. I recognise that not everybody wants to spend effort to fuzz their dependencies and that fuzzing everything at once is less effort – that’s where the options to increase the scope come in.

nagisa avatar Feb 21 '17 17:02 nagisa

libstd does not need to be instrumented for panics to be understood by the fuzzer. With that out of the way…

Well, yeah, but the branches in libstd will be opaque to the fuzzer, no?

Interesting point about fuzzing dependencies. We're currently using RUSTFLAGS but an option to use cargo rustc would be nice.

Manishearth avatar Feb 21 '17 17:02 Manishearth

Well, yeah, but the branches in libstd will be opaque to the fuzzer, no?

I suspect this is why I can't usefully fuzz the vobsub crate, which does lots of tricky, low-level parsing. The very first thing I do with the input bytes is search for an MPEG-2 Program Stream header:

            // Search for the start of a ProgramStream packet.
            let needle = &[0x00, 0x00, 0x01, 0xba];
            let start = self.remaining.windows(needle.len())
                .position(|window| needle == window);

When I fuzz this, cargo fuzz appears to be endlessly fuzzing single-byte inputs:

INFO: Seed: 577350870
INFO: Loaded 0 modules (0 guards): 
Loading corpus dir: corpus
INFO: -max_len is not provided, using 64
INFO: A corpus is not provided, starting from an empty corpus
#0	READ units: 1
#1	INITED cov: 63 corp: 1/1b exec/s: 0 rss: 70Mb
#1048576	pulse  cov: 63 corp: 1/1b exec/s: 524288 rss: 136Mb
#2097152	pulse  cov: 63 corp: 1/1b exec/s: 419430 rss: 194Mb
#4194304	pulse  cov: 63 corp: 1/1b exec/s: 419430 rss: 311Mb
#8388608	pulse  cov: 63 corp: 1/1b exec/s: 399457 rss: 544Mb
#16777216	pulse  cov: 63 corp: 1/1b exec/s: 409200 rss: 746Mb
#33554432	pulse  cov: 63 corp: 1/1b exec/s: 419430 rss: 747Mb
#67108864	pulse  cov: 63 corp: 1/1b exec/s: 422068 rss: 747Mb

I'm going to try reading the "san" docs to see if I can rebuild std, and see if that helps any.

emk avatar Mar 05 '17 13:03 emk

OK, there's some useful information in the Reddit thread announcing cargo fuzz. This links to Better backtraces, which explains how to rebuild rust-src using xargo. But I'm not quite sure how to make this work under cargo fuzz without modifying cargo fuzz itself. I'll try poking at this later if I have time.

I'd really like to fuzz vobsub; it's parsing all sorts of ancient, ugly MPEG-2 data, and I would love to provide it as part of a web service someday.

emk avatar Mar 05 '17 13:03 emk

@emk consider providing at least one starter file to the fuzzer (put into fuzz/corpus).


Well, yeah, but the branches in libstd will be opaque to the fuzzer, no?

That’s only for functions monomorphised within libstd. They are few and far between.

nagisa avatar Mar 05 '17 13:03 nagisa

@nagisa Thank you for your advice! It turned out I had two problems:

  1. The MPEG2 packet formats are complex enough that I really did need to start with a valid input to get good results.
  2. My driver was accidentally ignoring certain important code paths.

Now I've found 5 issues and I'm looking for more (I'll submit a PR for the trophy case shortly). These kinds of low-level binary decoders are excellent hunting grounds for bugs, even in safe mode.

I've provided documentation on how to fuzz-test vobsub here, in case anybody wants to experiment with it. Thank you for your help and for a great tool!

emk avatar Mar 05 '17 15:03 emk

Triage: documentation issue, excerpt from above:

rust-san has instructions for setting up an instrumented stdlib with xargo, we could probably just provide links to that and perhaps add a cargo fuzz command for setting that up. You're right that this doesn't have to be the default.

whitequark avatar Oct 05 '17 08:10 whitequark

I think this is kinda solved in nightly? https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std

elichai avatar Jan 20 '20 10:01 elichai

Yeah, you can also use xargo on stable. I'd accept PRs adding support for a build-std mode

Manishearth avatar Jan 20 '20 11:01 Manishearth

FWIW this is strictly required for Memory Sanitizer to work. Right now enabling msan in cargo-fuzz makes any binary immediately abort with a MSAN error. This is documented in more detail here: https://doc.rust-lang.org/nightly/unstable-book/compiler-flags/sanitizer.html#memorysanitizer

A recipe for MSAN build that also enables MSAN for C dependencies that are linked into the binary can be found here: https://github.com/rust-lang/rust/issues/39610#issuecomment-573391197

Shnatsel avatar Jun 05 '20 17:06 Shnatsel

This is now as simple as cargo fuzz run target_name -Z build-std

Also, #233 automatically enables this option if Memory Sanitizer is enabled.

Shnatsel avatar Jun 24 '20 21:06 Shnatsel

Should fuzz targets always use -Zbuild-std? The standard library has a lot of debug assertions and I'm sure there are Rust-level UB that the sanitizers won't understand but which could be caught by having debug_assertions turned on in the standard library.

saethlin avatar Jan 10 '22 14:01 saethlin