clusterfuzz Better support for Rust fuzzing

Similar to #860. There are some crashes reported in https://oss-fuzz.com/testcases?project=wasmtime and it seems that certain stackframes could be skipped (e.g. abort, abort_internal, rust_panic_with_hook, etc).

Not sure who assign this to. Perhaps @jonathanmetzman or maybe anyone else wants to own and champion Rust support //cc @inferno-chromium

Feb 03 '20 15:02 Dor1s

@eepeep this could be something that would be nice to get you more familiar with the codebase, and will help Fuchsia once rust fuzzing there is ready. WDYT?

Feb 03 '20 22:02 oliverchang

FWIW the support so far I feel has been fantastic. We've ironed out a few things wrt to the build image (making sure to fuzzers are optimized, making sure optimizations are actually effective, turning on debuginfo, etc), and that's probably got some more tweaks that could happen to it one way or another.

The only thing I think the reporting could really improve on that's Rust specific is the case you called out above. The way cargo fuzz works is that it treats any "panics" in Rust as a fatal bug, where a panic in Rust is pretty similar to a failed assert in C/C++. Currently the cargo fuzz infrastructure translates panic! to call abort() which is how libfuzzer picks it up, but the reporting could, as you mentioned, probably prune all stack frames up to anything with std::panicking in it. The panic message is pretty immediately seen in the logs once you click through as well.

I can try to help out with any Rust-specific questions/etc if necessary as well!

Feb 03 '20 22:02 alexcrichton

This is done in https://github.com/google/clusterfuzz/pull/1790, https://github.com/google/clusterfuzz/pull/1792. We parse asserts properly, remove all the crap stack frames.

@alexcrichton - i had some questions on stackframes themselves

What are these frames, looks like template functions right

#11 0x55a3e0467625 in _$LT$mp4parse_capi..Mp4parseAvifParser$u20$as$u20$mp4parse_capi..ContextParser$GT$::read::h3b7d4d3db512bcae mp4parse-rust/mp4parse_capi/src/lib.rs:377:9 for https://github.com/mozilla/mp4parse-rust/blob/9d58d7fef2ef90d509ba923c430d143c848cf4e2/mp4parse_capi/src/lib.rs#L377 Right now, we ignore any frames that start with _$LT

#13 0x55a3e0466c07 in mp4parse_capi::mp4parse_new_common::h5bcbdbc95c2c6730 mp4parse-rust/mp4parse_capi/src/lib.rs:467:15 In most stack frames, there is weird hash in every stack frame, e.g. h5bcbdbc95c2c6730 in this one. We right now keep it, but i am very worried if this changes across builds/dates, and then breaks deduplication (since based on crash state). Any way to remove this ?

May 21 '20 03:05 inferno-chromium

@alexcrichton - OSS-Fuzz Rust support is vaslty simplified - https://github.com/google/oss-fuzz/pull/3830 , https://github.com/google/oss-fuzz/pull/3840. Many projects didnt enable sanitizers, now ASan is properly enabled. Any comments to improve more things is appreciated. We dont enable MSan yet, but was hoping if you can see build failures with infra/helper.py for some of those projects to see what build.sh changes are needed.

May 21 '20 03:05 inferno-chromium

@inferno-chromium awesome progress! On the topic of symbols, currently rustc uses a C++-like name-mangling scheme but doesn't match it entirely. What rustc does it has a bunch of path components separated by :: trailed by a compiler-generated hash. Each component uses its own internal escapes (e.g. $LT$ is <). This means that for something like:

_$LT$mp4parse_capi..Mp4parseAvifParser$u20$as$u20$mp4parse_capi..ContextParser$GT$::read::h3b7d4d3db512bcae

Originally this was _ZN... but that's the start of a C++ symbol, so it was automatically demangled. This gave the ::-separated pieces
$LT$mp4parse_capi..Mp4parseAvifParser$u20$as$u20$mp4parse_capi..ContextParser$GT$ is the first piece, which actually translates to <mp4parse_capi::Mp4parseAvifParser as mp4parse_capi::ContextParser>, but it requires extra rustc-specific knowledge to know that.
The next part read is just a method
The final part h3b7d4d3db512bcae is a compiler-synthesized hash.

The tl;dr; is that Rust symbols accidentally look like C++ symbols meaning that a name demangler for C++ will get some better information but doesn't go all the way in demangling Rust. There's a Rust crate, rustc-demangle, which demangles Rust symbols. For the purposes of stack trace collection it should be safe to ignore the compiler-generated hash.

but was hoping if you can see build failures with infra/helper.py for some of those projects to see what build.sh changes are needed.

Sure! Do you have a link to those build failures?

May 21 '20 14:05 alexcrichton

@inferno-chromium awesome progress! On the topic of symbols, currently rustc uses a C++-like name-mangling scheme but doesn't match it entirely. What rustc does it has a bunch of path components separated by :: trailed by a compiler-generated hash. Each component uses its own internal escapes (e.g. $LT$ is <). This means that for something like:
_$LT$mp4parse_capi..Mp4parseAvifParser$u20$as$u20$mp4parse_capi..ContextParser$GT$::read::h3b7d4d3db512bcae 
Originally this was _ZN... but that's the start of a C++ symbol, so it was automatically demangled. This gave the ::-separated pieces

$LT$mp4parse_capi..Mp4parseAvifParser$u20$as$u20$mp4parse_capi..ContextParser$GT$ is the first piece, which actually translates to <mp4parse_capi::Mp4parseAvifParser as mp4parse_capi::ContextParser>, but it requires extra rustc-specific knowledge to know that.

The next part read is just a method

The final part h3b7d4d3db512bcae is a compiler-synthesized hash.

Thanks for confirming this is a hash. Would really love to have some build flag to disable this feature. So far, it is not changing across builds, so we are fine, otherwise i can look into stripping it in future.

The tl;dr; is that Rust symbols accidentally look like C++ symbols meaning that a name demangler for C++ will get some better information but doesn't go all the way in demangling Rust. There's a Rust crate, rustc-demangle, which demangles Rust symbols. For the purposes of stack trace collection it should be safe to ignore the compiler-generated hash.

Adding a demangler just for these templated function is overkill. Most of the stack frames are demangled fine with just the default llvm symbolizer.

but was hoping if you can see build failures with infra/helper.py for some of those projects to see what build.sh changes are needed.

Sure! Do you have a link to those build failures?

Yes please try any OSS-Fuzz rust project. E.g. python infra/helper.py pull_images python infra/helper.py build_fuzzers --sanitizer memory wasmtime

try wasmtime, or like mp4parse-rust, etc

it break build with errors like /src/llvm-project/libcxxabi/src/private_typeinfo.cpp:1270: undefined reference to `__msan_warning_noreturn' Ideally i want to simplify this for all rust projects, e.g. in https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-builder/compile#L71. But can also modify build.sh for all projects if needed.

May 21 '20 16:05 inferno-chromium

Unfortunately the hash cannot be turned off, it's required for correctness to link libraries together in rustc. The compiler assumes it's there to ensure that two same-named libraries linked together don't conflict in symbols.

I agree that a demangler for these is likely overkill, but without a Rust-specific demangler the symbols are always going to be a bit wonky in Rust reports. There's a new symbol mangling scheme as well which is not yet enabled by default but will be eventually, and at that point Rust symbols won't look like C++ ones any more as well.

For building things, I ran:

$ python infra/helper.py pull_images
$ python infra/helper.py build_fuzzers --sanitizer memory mp4parse-rust

but I didn't get any errors?

$ python infra/helper.py pull_images
$ python infra/helper.py build_fuzzers --sanitizer memory wasmtime

and I did indeed get a lot of errors. The issue here though is that a dependency of wasmtime, binaryen, is a C++ project built with CMake. In building binaryen it looks like C/C++ is built with sanitizer intrinsics but they aren't linked in for the final build step.

I think the issue is that these lines should be defined for Rust as well (since Rust projects can include C/C++ code). I can't seem to figure out how to test that though because I can't seem to get the base-builder image to rebuild locally (it's always using a remotely-fetched one). Is there a way that I can rebuild the base-builder locally and build with that?

May 21 '20 16:05 alexcrichton

Unfortunately the hash cannot be turned off, it's required for correctness to link libraries together in rustc. The compiler assumes it's there to ensure that two same-named libraries linked together don't conflict in symbols.

I agree that a demangler for these is likely overkill, but without a Rust-specific demangler the symbols are always going to be a bit wonky in Rust reports. There's a new symbol mangling scheme as well which is not yet enabled by default but will be eventually, and at that point Rust symbols won't look like C++ ones any more as well.

For building things, I ran:
$ python infra/helper.py pull_images
$ python infra/helper.py build_fuzzers --sanitizer memory mp4parse-rust
but I didn't get any errors?
$ python infra/helper.py pull_images
$ python infra/helper.py build_fuzzers --sanitizer memory wasmtime
and I did indeed get a lot of errors. The issue here though is that a dependency of wasmtime, binaryen, is a C++ project built with CMake. In building binaryen it looks like C/C++ is built with sanitizer intrinsics but they aren't linked in for the final build step.

I think the issue is that these lines should be defined for Rust as well (since Rust projects can include C/C++ code). I can't seem to figure out how to test that though because I can't seem to get the base-builder image to rebuild locally (it's always using a remotely-fetched one). Is there a way that I can rebuild the base-builder locally and build with that?

It did work and let me reenable them. Dont know why other project owners disabled it in first place.

May 21 '20 16:05 inferno-chromium

@alexcrichton - can you please keep us posted on when you will enable the new demangling scheme, it will break all projects if we dont support this. Can you also keep a build flag to disable that so that transition/migration is easier.

May 21 '20 17:05 inferno-chromium

Symbol mangling changes are tracked at https://github.com/rust-lang/rust/issues/60705. I don't currently know the intentions of the compiler team about preserving an option to opt-in to the old scheme.

May 21 '20 17:05 alexcrichton

Thanks @alexcrichton , have subscribed to that.

From the point of work remaining, i think adding the demangler is the last thing left. Is rustc-demangle the official one or another one in plans for the new scheme ?

May 21 '20 17:05 inferno-chromium

The project AFAIK hasn't blessed something as an official demangler, but rustc-demangle is as close as anything can be to being official without actually being official. It's used by the Rust standard library, for example, to demangle backtraces.

May 21 '20 17:05 alexcrichton

@alexcrichton - that is good enough for us :)

May 21 '20 17:05 inferno-chromium

If it's helpful that crate also has a C API for calling into the demangler.

May 21 '20 17:05 alexcrichton