Proposal: DFT-based fuzzing
(consider this as a work-in-progress design doc, it will be periodically updated) EDIT 2019-06-20
Data Flow Trace
The Data Flow Trace (DFT) tells the fuzzing engine which bytes of a given input affect which comparison instructions. In the following example, if an input reaches CMP1, DFT will tell us that CMP1 is affected by data[55], data[66] and data[77].
int LLVMFuzzerTestOneInput(const unsigned char *data, size_t size) {
int x = SomeFunctionOf(data[55], data[66]);
...
if(x == data[77]) // CMP1
...
}
DataFlowSanitizer (DFSan) allows us to collect byte-precise DFT, typically at the cost of several executions of a given input.
Collecting the DFT
In order to collect the DFT the target needs to be compiled with DFSan+SanitizerCoverage and linked with a special driver. The exact details are here.
Then the DFT needs to be collected for the entire seed corpus (see the example below). This will create a new directory with the DFT, which then needs to be compressed and stored on the network disk.
Using the DFT
The libFuzzer runners will use the DFT with some probability. If DFT is chosen for a particular run, the DFT directory is downloaded from the network disk and uncompressed on the local runner. Note that the DFT from the previous fuzzing iteration remains mostly usable, and so we do not need to synchronize the DFT collection and the use.
libFuzzer will need to be run with two extra flags (other flags are as usual):
-
-data_flow_trace=<DFT_DIR>: this simply instructs libFuzzer to load the DFT from<DFT_DIR>. -
-focus_function=auto: this instructs libFuzzer to choose a focus function based on the DFT.
Alternatively, DFT could be collected by libFuzzer on the fly with -collect_data_flow=./dft-binary -fork=1, see below.
Example
This command sequence shows how to apply DFT-based fuzzing to the OnlySomeBytesTest.cpp puzzle.
#!/bin/bash
LLVM=$HOME/llvm-project
RT=$LLVM/compiler-rt
# Build the regular fuzzer binary.
clang -g -O0 -fsanitize=fuzzer $RT/test/fuzzer/OnlySomeBytesTest.cpp -o fuzzer-lf
# Build the DFT binary.
clang -c -fsanitize=dataflow $RT/lib/fuzzer/dataflow/DataFlow.cpp
clang -c -fPIC $RT/lib/fuzzer/dataflow/DataFlowCallbacks.cpp
clang -g -fsanitize=dataflow -fsanitize-coverage=trace-pc-guard,pc-table,bb,trace-cmp \
$RT/test/fuzzer/OnlySomeBytesTest.cpp DataFlow*.o -o fuzzer-dft
# create the corpus
rm -rf CORPUS && mkdir CORPUS
(echo -n ABC; for((i=0;i<4093;i++)) ; do echo -n x; done) > CORPUS/seed
./fuzzer-lf CORPUS/ -use_value_profile=1 -runs=1000000 # Very unlikely to find the bug.
# create_dft()
rm -rf DFT && ./fuzzer-lf -collect_data_flow=./fuzzer-dft -data_flow_trace=DFT CORPUS
# Use DFT. This should find the bug almost instantly.
rm -rf C2; mkdir C2
./fuzzer-lf C2 CORPUS/ -use_value_profile=1 -data_flow_trace=DFT \
-focus_function=auto -jobs=20 -artifact_prefix=C2/
# Or, much simpler with fork mode which will collect DFT itself:
./fuzzer-lf -use_value_profile=1 -collect_data_flow=./fuzzer-dft -fork=1
Do you have any plans to allow specifying more than 1 focus function?
No such plans yet, I want to polish the simplest workflow first. Besides, I am not sure if that will make any sense, after all if you have two things to focus on, you don't have a focus.
Inspired by AUTOGRAM, I've realized that we could try generating protobuf descriptions based on DFSan traces.
Temporary assigning to myself to do a very quick evaluation.
If anyone wants to play locally:
- Wait till https://reviews.llvm.org/rL208268 lands
- Check out #2292 locally or wait until it lands too
- Build stuff (if #2292 lands, you can do
python infra/helper.py pull_imagesinstead of re-building base images locally):
$ project=zlib # or anything else, preferable small and written in C
$ python infra/helper.py build_image --no-pull base-clang \
&& python infra/helper.py build_image --no-pull base-builder \
&& python infra/helper.py build_image --no-pull $project \
&& python infra/helper.py build_fuzzers --engine dataflow --sanitizer dataflow $project
50 projects succeeded to build:
gs://clusterfuzz-builds-dataflow/aosp/aosp-dataflow-201904091507.zip
gs://clusterfuzz-builds-dataflow/brotli/brotli-dataflow-201904091507.zip
gs://clusterfuzz-builds-dataflow/bzip2/bzip2-dataflow-201904091507.zip
gs://clusterfuzz-builds-dataflow/c-ares/c-ares-dataflow-201904091507.zip
gs://clusterfuzz-builds-dataflow/capstone/capstone-dataflow-201904091507.zip
gs://clusterfuzz-builds-dataflow/cmark/cmark-dataflow-201904091508.zip
gs://clusterfuzz-builds-dataflow/fuzzing-puzzles/fuzzing-puzzles-dataflow-201904091508.zip
gs://clusterfuzz-builds-dataflow/giflib/giflib-dataflow-201904091508.zip
gs://clusterfuzz-builds-dataflow/harfbuzz/harfbuzz-dataflow-201904091508.zip
gs://clusterfuzz-builds-dataflow/hoextdown/hoextdown-dataflow-201904091508.zip
gs://clusterfuzz-builds-dataflow/json-c/json-c-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/lcms/lcms-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/libchewing/libchewing-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/libexif/libexif-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/libgit2/libgit2-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/libidn2/libidn2-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/libldac/libldac-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libpcap/libpcap-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libplist/libplist-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libteken/libteken-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libtsm/libtsm-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libwebp/libwebp-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libyaml/libyaml-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/lzo/lzo-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/mbedtls/mbedtls-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/minizip/minizip-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/mupdf/mupdf-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/nestegg/nestegg-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/nghttp2/nghttp2-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/openjpeg/openjpeg-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/openthread/openthread-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/openvswitch/openvswitch-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/opus/opus-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/pcre2/pcre2-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/pffft/pffft-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/qcms/qcms-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/qpid-proton/qpid-proton-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/qubes-os/qubes-os-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/radare2/radare2-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/tpm2-tss/tpm2-tss-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/unicorn/unicorn-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/vorbis/vorbis-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/wolfssl/wolfssl-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/wuffs/wuffs-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/xz/xz-dataflow-201904091513.zip
gs://clusterfuzz-builds-dataflow/yajl-ruby/yajl-ruby-dataflow-201904091513.zip
gs://clusterfuzz-builds-dataflow/yara/yara-dataflow-201904091513.zip
gs://clusterfuzz-builds-dataflow/zlib-ng/zlib-ng-dataflow-201904091513.zip
gs://clusterfuzz-builds-dataflow/zlib/zlib-dataflow-201904091513.zip
gs://clusterfuzz-builds-dataflow/zstd/zstd-dataflow-201904091513.zip
Hey @kcc, could you please check the log attached. The issue I'm seeing with the second target I'm testing is that the script quickly runs into ==59901==FATAL: DataFlowSanitizer: out of labels error (in this case after running first 16 inputs) and then it keeps trying the same input again and again with no luck. Am I doing anything wrong?
If you need to reproduce:
- Download
gs://clusterfuzz-builds-dataflow/zstd/zstd-dataflow-201904091513.zip - Download
gs://zstd-backup.clusterfuzz-external.appspot.com/corpus/libFuzzer/zstd_block_decompress/latest.zip - Unpack both, run
block_decompresstarget
Ah, I guess the real root cause is that some inputs are too long. What would be a good threshold to trim / ignore long ones?
DFSan supports ~ 2^16 labels, but I would put a much lower threshold, e.g. 2^14 bytes for now. We can extend later at the cost of some (small) extra complexity. (I'll double-check what exactly is going on a bit later)
I'm gonna try skipping such inputs in the script instead of retrying. That should make life much easier and all changes will live in LLVM repo (i.e. no hacky corpus trimming on user end).
Yeah, https://reviews.llvm.org/D60538 seems to be a reasonable workaround for now.
And now libFuzzer is crashing with the following stacktrace (looks like it tries to mutate an empty input, though there aren't empty inputs in the corpus):
asan_block_decompress: /src/libfuzzer/FuzzerMutate.cpp:510: size_t fuzzer::MutationDispatcher::MutateImpl(uint8_t *, size_t, size_t, Vector<fuzzer::MutationDispatcher::Mutator> &): Assertion `MaxSize > 0' failed.
==81393== ERROR: libFuzzer: deadly signal
#0 0x4c0171 in __sanitizer_print_stack_trace /src/llvm/projects/compiler-rt/lib/asan/asan_stack.cc:86
#1 0x69ecdd in fuzzer::PrintStackTrace() /src/libfuzzer/FuzzerUtil.cpp:205:5
#2 0x652e5e in fuzzer::Fuzzer::CrashCallback() /src/libfuzzer/FuzzerLoop.cpp:234:3
#3 0x7f2aee76a0bf (/lib/x86_64-linux-gnu/libpthread.so.0+0x110bf)
#4 0x7f2aeddc8fce in gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x32fce)
#5 0x7f2aeddca3f9 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x343f9)
#6 0x7f2aeddc1e36 (/lib/x86_64-linux-gnu/libc.so.6+0x2be36)
#7 0x7f2aeddc1ee1 in __assert_fail (/lib/x86_64-linux-gnu/libc.so.6+0x2bee1)
#8 0x68d567 in fuzzer::MutationDispatcher::MutateImpl(unsigned char*, unsigned long, unsigned long, std::__1::vector<fuzzer::MutationDispatcher::Mutator, fuzzer::fuzzer_allocator<fuzzer::MutationDispatcher::Mutator> >&) /src/libfuzzer/FuzzerMutate.cpp:510:3
#9 0x68d92a in Mutate /src/libfuzzer/FuzzerMutate.cpp:498:10
#10 0x68d92a in fuzzer::MutationDispatcher::MutateWithMask(unsigned char*, unsigned long, unsigned long, std::__1::vector<unsigned char, fuzzer::fuzzer_allocator<unsigned char> > const&) /src/libfuzzer/FuzzerMutate.cpp:546
#11 0x658b33 in fuzzer::Fuzzer::MutateAndTestOne() /src/libfuzzer/FuzzerLoop.cpp:659:20
#12 0x65bea8 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:814:5
#13 0x6207b1 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:776:6
#14 0x6131a7 in main /src/libfuzzer/FuzzerMain.cpp:19:10
#15 0x7f2aeddb62b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
#16 0x41d8e8 in _start (/usr/local/google/home/mmoroz/Downloads/df/asan_block_decompress+0x41d8e8)
NOTE: libFuzzer has rudimentary signal handlers.
Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
MS: 0 ; base unit: 2a4e35072f5775415da786443a900821629e6c60
Could be a trivial bug... Can you try this?
Index: FuzzerMutate.cpp
===================================================================
--- FuzzerMutate.cpp (revision 358040)
+++ FuzzerMutate.cpp (working copy)
@@ -542,6 +542,7 @@
if (Mask[I])
T[OneBits++] = Data[I];
+ if (!OneBits) return 0;
assert(!T.empty());
size_t NewSize = Mutate(T.data(), OneBits, OneBits);
assert(NewSize <= OneBits);
Thanks, @kcc! It helped with one more change, I've uploaded both in https://reviews.llvm.org/D60567
However, now I'm getting another crash (looks like the Mask is shorter than the input somehow):
supernew_asan_block_decompress: /src/libfuzzer/FuzzerMutate.cpp:532: size_t fuzzer::MutationDispatcher::MutateWithMask(uint8_t *, size_t, size_t, const Vector<uint8_t> &): Assertion `Size <= Mask.size()' failed.
==3743== ERROR: libFuzzer: deadly signal
#0 0x4c0171 in __sanitizer_print_stack_trace /src/llvm/projects/compiler-rt/lib/asan/asan_stack.cc:86
#1 0x69eccd in fuzzer::PrintStackTrace() /src/libfuzzer/FuzzerUtil.cpp:205:5
#2 0x652e5e in fuzzer::Fuzzer::CrashCallback() /src/libfuzzer/FuzzerLoop.cpp:234:3
#3 0x7f29962790bf (/lib/x86_64-linux-gnu/libpthread.so.0+0x110bf)
#4 0x7f29958d7fce in gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x32fce)
#5 0x7f29958d93f9 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x343f9)
#6 0x7f29958d0e36 (/lib/x86_64-linux-gnu/libc.so.6+0x2be36)
#7 0x7f29958d0ee1 in __assert_fail (/lib/x86_64-linux-gnu/libc.so.6+0x2bee1)
#8 0x68dc5d in fuzzer::MutationDispatcher::MutateWithMask(unsigned char*, unsigned long, unsigned long, std::__1::vector<unsigned char, fuzzer::fuzzer_allocator<unsigned char> > const&) /src/libfuzzer/FuzzerMutate.cpp:532:3
#9 0x658b32 in fuzzer::Fuzzer::MutateAndTestOne() /src/libfuzzer/FuzzerLoop.cpp:659:20
#10 0x65beb8 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:816:5
#11 0x6207b1 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:776:6
#12 0x6131a7 in main /src/libfuzzer/FuzzerMain.cpp:19:10
#13 0x7f29958c52b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
#14 0x41d8e8 in _start (/usr/local/google/home/mmoroz/projects/df/supernew_asan_block_decompress+0x41d8e8)
NOTE: libFuzzer has rudimentary signal handlers.
Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
MS: 1 CMP- DE: "\x00\xc0\x00\x00\x00\x00\x00\x00"-; base unit: 57fb087984f90ba1677e27e934fb1ec989850df4
Corpus unit is 71 bytes and trace is 1867 bytes long:
$ ls -l block_decompress_corpus/57fb087984f90ba1677e27e934fb1ec989850df4
-rw-r--r-- 1 mmoroz 71 Apr 9 19:14 block_decompress_corpus/57fb087984f90ba1677e27e934fb1ec989850df4
$ ls -l block_decompress_dft/57fb087984f90ba1677e27e934fb1ec989850df4
-rw-r--r-- 1 mmoroz 1867 Apr 10 13:30 block_decompress_dft/57fb087984f90ba1677e27e934fb1ec989850df4
Thanks Kostya for explaining some of the things in more detail. With one more change (https://reviews.llvm.org/D60571) I've got that fuzz target running locally!
See below the difference in the disk space used by DataFlow traces vs corpus. Some targets are missing and some might be not fully correct as I ran out of disk space, but IMO it's safe to conclude 2-10x difference for most of the cases.:
Ratio Corpus size DFT size ./project/fuzz_target_name
----------------------------------------------------------------------------------------
2.98x from 158M to 471M for ./capstone/fuzz_disasmv4
2.07x from 202M to 418M for ./capstone/fuzz_disasmnext
2.19x from 43M to 94M for ./vorbis/decode_fuzzer
3.25x from 24M to 78M for ./wolfssl/pem_cert
2.63x from 19M to 50M for ./lcms/cmsIT8_load_fuzzer
1.11x from 117M to 130M for ./unicorn/fuzz_emu_x86_32
1.37x from 106M to 145M for ./unicorn/fuzz_emu_mips_32le
2.30x from 133M to 306M for ./unicorn/fuzz_emu_x86_64
1.39x from 114M to 159M for ./unicorn/fuzz_emu_mips_32be
2.09x from 43M to 90M for ./unicorn/fuzz_emu_arm_arm
2.24x from 33M to 74M for ./libexif/exif_loader_fuzzer
9.39x from 120M to 1.1G for ./aosp/sqlite
2.00x from 4.0K to 8.0K for ./hoextdown/hoedown_fuzzer
5.39x from 399M to 2.1G for ./mupdf/pdf_fuzzer
5.06x from 65M to 329M for ./qpid-proton/fuzz-message-decode
1.07x from 134M to 143M for ./openjpeg/opj_decompress_fuzzer
3.50x from 20M to 70M for ./mbedtls/fuzz_x509crl
4.58x from 36M to 165M for ./mbedtls/fuzz_dtlsserver
14.57x from 14M to 204M for ./mbedtls/fuzz_x509csr
6.47x from 19M to 123M for ./mbedtls/fuzz_pubkey
2.27x from 11M to 25M for ./mbedtls/fuzz_privkey
3.00x from 4.0K to 12K for ./fuzzing-puzzles/multiple_constraints_on_small_input_afl_fuzzer
16.93x from 127M to 2.1G for ./radare2/ia_fuzz
1.71x from 17M to 29M for ./opus/opus_decode_fuzzer_fixed
1.68x from 19M to 32M for ./opus/opus_decode_fuzzer_floating
1.33x from 12K to 16K for ./zlib/example_flush_fuzzer
1.03x from 33M to 34M for ./zlib/example_dict_fuzzer
1.55x from 53M to 82M for ./yara/rules_fuzzer
2560.00x from 8.0K to 20M for ./yara/macho_fuzzer
22.08x from 25M to 552M for ./openthread/radio-receive-done-fuzzer
1.43x from 21M to 30M for ./openthread/ip6-send-fuzzer
4.69x from 68M to 319M for ./openthread/cli-uart-received-fuzzer
3.04x from 4.6M to 14M for ./c-ares/ares_create_query_fuzzer
2.03x from 5.9M to 12M for ./libidn2/libidn2_to_unicode_8z8z_fuzzer
1.22x from 49M to 60M for ./libpcap/fuzz_both
3.00x from 4.0K to 12K for ./libchewing/chewing_default_fuzzer
3.00x from 4.0K to 12K for ./libchewing/chewing_random_init_fuzzer
3.00x from 4.0K to 12K for ./libchewing/chewing_dynamic_config_fuzzer
1.26x from 50M to 63M for ./libwebp/fuzz_simple_api
2.80x from 20M to 56M for ./libwebp/fuzz_webp_enc_dec
1.78x from 23M to 41M for ./libwebp/fuzz_webp_animencoder
1.56x from 390M to 608M for ./harfbuzz/hb-shape-fuzzer
4.29x from 152M to 652M for ./harfbuzz/hb-subset-fuzzer
2.97x from 36M to 107M for ./nghttp2/nghttp2_fuzzer
1.63x from 263M to 430M for ./cmark/cmark_fuzzer
62976.00x from 4.0K to 246M for ./zlib-ng/compress_fuzzer
1.39x from 46M to 64M for ./zlib-ng/example_dict_fuzzer
1.88x from 32M to 60M for ./zstd/simple_decompress
2.21x from 169M to 373M for ./zstd/stream_round_trip
5.90x from 29M to 171M for ./zstd/stream_decompress
1.29x from 17M to 22M for ./zstd/block_decompress
2.04x from 78M to 159M for ./zstd/block_round_trip
Nice, thanks! I didn't even try to optimize the disk size yet, wanted to see if the logic works at all. I think the easiest way to optimize the disk space is to zlib-compress the data.
Just added -focus_function=auto which will make libFuzzer choose the focus function automatically based on the coverage data contained in the trace files.
So far tested only on a tiny test.
I will keep testing and tuning it, but the basic functionality is there.
I've reimplemented the python scripts in libFuzzer proper (LLVM r360712).
The current work flow:
#!/bin/bash
LLVM=$HOME/llvm
RT=$LLVM/projects/compiler-rt
# Build the regular fuzzer binary.
clang -g -O1 -fsanitize=fuzzer $RT/test/fuzzer/OnlySomeBytesTest.cpp -o fuzzer-lf
# Build the DFT binary.
clang -c -fsanitize=dataflow $RT/lib/fuzzer/dataflow/DataFlow.cpp
clang -g -fsanitize=dataflow -fsanitize-coverage=trace-pc-guard,pc-table,bb,trace-cmp \
$RT/test/fuzzer/OnlySomeBytesTest.cpp DataFlow.o -o fuzzer-dft
# create the corpus
rm -rf CORPUS && mkdir CORPUS
(echo -n ABC; for((i=0;i<4093;i++)) ; do echo -n x; done) > CORPUS/seed
./fuzzer-lf CORPUS/ -use_value_profile=1 -runs=1000000 # Very unlikely to find the bug.
# create_dft()
rm -rf DFT && ./fuzzer-lf -collect_data_flow=./fuzzer-dft -data_flow_trace=DFT CORPUS
# Use DFT. This should find the bug almost instantly.
rm -rf C2; mkdir C2
./fuzzer-lf C2 CORPUS/ -use_value_profile=1 -data_flow_trace=DFT \
-focus_function=auto -jobs=20 -artifact_prefix=C2/```
I have not tested this on anything real yet, only on the above synthetic puzzle.
I've tried to build all the projects once again (in order to have a better sampling of the builds and choose only stable ones for the experiment), and this time only 4 project builds succeeded:
$ gsutil ls -r gs://clusterfuzz-builds-dataflow/ | egrep 20190517
gs://clusterfuzz-builds-dataflow/c-ares/c-ares-dataflow-201905171213.srcmap.json
gs://clusterfuzz-builds-dataflow/c-ares/c-ares-dataflow-201905171213.zip
gs://clusterfuzz-builds-dataflow/radare2/radare2-dataflow-201905171217.srcmap.json
gs://clusterfuzz-builds-dataflow/radare2/radare2-dataflow-201905171217.zip
gs://clusterfuzz-builds-dataflow/zlib-ng/zlib-ng-dataflow-201905171217.srcmap.json
gs://clusterfuzz-builds-dataflow/zlib-ng/zlib-ng-dataflow-201905171217.zip
gs://clusterfuzz-builds-dataflow/zlib/zlib-dataflow-201905171217.srcmap.json
gs://clusterfuzz-builds-dataflow/zlib/zlib-dataflow-201905171217.zip
Checking the logs... Maybe recent migration broke the others.
Yeah, with a newer version of #2303 I'm able to build many projects again.
41 projects which we should try DFT-based fuzzing on:
aosp
brotli
bzip2
capstone
cmark
giflib
harfbuzz
hoextdown
lcms
libchewing
libexif
libgit2
libidn2
libldac
libpcap
libplist
libteken
libtsm
libwebp
libyaml
lzo
mbedtls
minizip
mupdf
nestegg
nghttp2
openjpeg
openthread
openvswitch
opus
pcre2
pffft
qcms
radare2
vorbis
wolfssl
wuffs
xz
yara
zlib
zstd
New workflow:
- build the two binaries as above
- run the libFuzzer binary with -fork=N and -collect_data_flow=<DFT_BINARY>
./fuzzer-lf -use_value_profile=1 -collect_data_flow=./fuzzer-dft -fork=20
(again, not tested yet outside of tiny examples)
With this workflow we may not need any kind of DFT management from ClusterFuzz -- just let CF ship both binary (libFuzzer and DFT) to a worker and invoke the fuzzer binary with -collect_data_flow=./DFT -fork=1
Ack. I've just got a null deref in libFuzzer locally, but I think it has something to do with the way things are getting built now (i.e. -fsanitize=fuzzer uses LLVM that is more than a week old and doesn't have your DFT changes):
$ asan/zlib_uncompress_fuzzer -use_value_profile=1 -collect_data_flow=dfsan/zlib_uncompress_fuzzer -print_final_stats=1 -max_total_time=3600 -timeout=25 corpus/df_new corpus/new/ corpus/cf/
INFO: Seed: 2102970602
AddressSanitizer:DEADLYSIGNAL
=================================================================
==228104==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f0ba6dbd646 bp 0x7ffff27e9370 sp 0x7ffff27e8b28 T0)
==228104==The signal is caused by a READ memory access.
==228104==Hint: address points to the zero page.
#0 0x7f0ba6dbd645 in strlen (/lib/x86_64-linux-gnu/libc.so.6+0x80645)
#1 0x4d2758 in __interceptor_strlen /src/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc
#2 0x46364b in length /work/llvm-stage2/projects/compiler-rt/lib/fuzzer/libcxx_fuzzer_x86_64/include/c++/v1/__string:217:53
#3 0x46364b in basic_string /work/llvm-stage2/projects/compiler-rt/lib/fuzzer/libcxx_fuzzer_x86_64/include/c++/v1/string:821
#4 0x46364b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:718
#5 0x48bad2 in main /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerMain.cpp:19:10
#6 0x7f0ba6d5d2b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
#7 0x41dad8 in _start (/usr/local/google/home/mmoroz/projects/dataflow/zlib/asan/zlib_uncompress_fuzzer+0x41dad8)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib/x86_64-linux-gnu/libc.so.6+0x80645) in strlen
==228104==ABORTING
Need to think of a convenient way of using a ToT. @jonathanmetzman do you have any suggestions?
I obviously can bump and force LLVM revision locally, and avoid pulling the images from OSS-Fuzz, but that seems a bit too heavyweight.
Maybe change $LIB_FUZZING_ENGINE to /path/tolibFuzzingEngine.a in dataflow sanitizer builds?
In dataflow builds LIB_FUZZING_ENGINE is pointing to DataFlow.o -- it doesn't use libFuzzer. I need to hack --engine libfuzzer build. Others may need to do it as well from time to time (e.g. you or an intern testing something new, Matt fixing something upstream, etc), so I'm thinking maybe we should add some extra flag or libfuzzer-tot engine option.
For now, bumped LLVM to r361579 locally. The crash reproduced anyway, probably because I didn't use -fork= mode:
if (Flags.collect_data_flow && !Flags.fork && !Flags.merge) {
if (RunIndividualFiles)
return CollectDataFlow(Flags.collect_data_flow, Flags.data_flow_trace,
ReadCorpora({}, *Inputs));
else
return CollectDataFlow(Flags.collect_data_flow, Flags.data_flow_trace, // :720, crash here
ReadCorpora(*Inputs, {}));
}
stacktrace isn't super helpful, CC @kcc maybe you can quickly realize what's wrong:
#0 0x7fc523c6a645 in strlen (/lib/x86_64-linux-gnu/libc.so.6+0x80645)
#1 0x4d3548 in __interceptor_strlen /src/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc
#2 0x463f6d in length /work/llvm-stage2/projects/compiler-rt/lib/fuzzer/libcxx_fuzzer_x86_64/include/c++/v1/__string:217:53
#3 0x463f6d in basic_string /work/llvm-stage2/projects/compiler-rt/lib/fuzzer/libcxx_fuzzer_x86_64/include/c++/v1/string:821
#4 0x463f6d in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:720
#5 0x48c8c2 in main /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerMain.cpp:19:10
#6 0x7fc523c0a2b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
#7 0x41db28 in _start (/usr/local/google/home/mmoroz/projects/dataflow/zlib/asan/zlib_uncompress_fuzzer+0x41db28)
With fork mode enabled it seems to be running, so I'll report back on the progress.
zlib_uncompress_fuzzer, -fork=1, 1 hour:
$ asan/zlib_uncompress_fuzzer -use_value_profile=1 -collect_data_flow=dfsan/zlib_uncompress_fuzzer -print_final_stats=1 -max_total_time=3600 -fork=1 -timeout=25 corpus/df_
new corpus/new/ corpus/cf/
INFO: Seed: 1493609206
INFO: Loaded 1 modules (664 inline 8-bit counters): 664 [0x7e5af0, 0x7e5d88),
INFO: Loaded 1 PC tables (664 PCs): 664 [0x5a0878,0x5a31f8),
INFO: -fork=1: fuzzing in separate process(s)
INFO: -fork=1: 1428 seed inputs, starting to fuzz in /tmp/libFuzzerTemp.59644.dir
INFO: fuzzed for 3658 seconds, wrapping up soon
INFO: exiting: 0 time: 3658s
The output directory is empty, and I don't have any logs to look at. I've been checking temp logs occasionally to make sure things were running.
I know that my setup works though, because when I tried running over a corpus subset, I've got new units written. The output in that case looks similar to the output of a regular -fork mode run, so I think we shouldn't miss any stats.
@kcc, another question for you: how do I see how much time is spent on collecting DFT? I'm just worried that if I enable it in current CF configuration, we'll be fuzzing up to ~44 minutes each run, and I don't want to spend too much time on collecting the traces.