(consider this as a work-in-progress design doc, it will be periodically updated) EDIT 2019-06-20

Data Flow Trace

The Data Flow Trace (DFT) tells the fuzzing engine which bytes of a given input affect which comparison instructions. In the following example, if an input reaches CMP1, DFT will tell us that CMP1 is affected by data[55], data[66] and data[77].

int LLVMFuzzerTestOneInput(const unsigned char *data, size_t size) {
  int x = SomeFunctionOf(data[55], data[66]);
  ...
  if(x == data[77]) // CMP1
    ...
}

DataFlowSanitizer (DFSan) allows us to collect byte-precise DFT, typically at the cost of several executions of a given input.

Collecting the DFT

In order to collect the DFT the target needs to be compiled with DFSan+SanitizerCoverage and linked with a special driver. The exact details are here.

Then the DFT needs to be collected for the entire seed corpus (see the example below). This will create a new directory with the DFT, which then needs to be compressed and stored on the network disk.

Using the DFT

The libFuzzer runners will use the DFT with some probability. If DFT is chosen for a particular run, the DFT directory is downloaded from the network disk and uncompressed on the local runner. Note that the DFT from the previous fuzzing iteration remains mostly usable, and so we do not need to synchronize the DFT collection and the use.

libFuzzer will need to be run with two extra flags (other flags are as usual):

-data_flow_trace=<DFT_DIR>: this simply instructs libFuzzer to load the DFT from <DFT_DIR>.
-focus_function=auto: this instructs libFuzzer to choose a focus function based on the DFT.

Alternatively, DFT could be collected by libFuzzer on the fly with -collect_data_flow=./dft-binary -fork=1, see below.

Example

This command sequence shows how to apply DFT-based fuzzing to the OnlySomeBytesTest.cpp puzzle.

#!/bin/bash
LLVM=$HOME/llvm-project
RT=$LLVM/compiler-rt
# Build the regular fuzzer binary.
clang -g -O0 -fsanitize=fuzzer $RT/test/fuzzer/OnlySomeBytesTest.cpp -o fuzzer-lf
# Build the DFT binary.
clang -c  -fsanitize=dataflow $RT/lib/fuzzer/dataflow/DataFlow.cpp
clang -c -fPIC $RT/lib/fuzzer/dataflow/DataFlowCallbacks.cpp
clang -g -fsanitize=dataflow -fsanitize-coverage=trace-pc-guard,pc-table,bb,trace-cmp  \
    $RT/test/fuzzer/OnlySomeBytesTest.cpp DataFlow*.o -o fuzzer-dft

# create the corpus
rm -rf CORPUS && mkdir CORPUS
(echo -n ABC; for((i=0;i<4093;i++)) ; do echo -n x; done) > CORPUS/seed
./fuzzer-lf CORPUS/ -use_value_profile=1 -runs=1000000 # Very unlikely to find the bug.

# create_dft()
rm -rf DFT && ./fuzzer-lf -collect_data_flow=./fuzzer-dft -data_flow_trace=DFT CORPUS

# Use DFT. This should find the bug almost instantly.
rm -rf C2; mkdir C2
./fuzzer-lf C2 CORPUS/ -use_value_profile=1 -data_flow_trace=DFT \
  -focus_function=auto -jobs=20 -artifact_prefix=C2/

# Or, much simpler with fork mode which will collect DFT itself:
./fuzzer-lf -use_value_profile=1 -collect_data_flow=./fuzzer-dft -fork=1

Jul 19 '18 02:07 kcc

Do you have any plans to allow specifying more than 1 focus function?

Jul 19 '18 15:07 Dor1s

No such plans yet, I want to polish the simplest workflow first. Besides, I am not sure if that will make any sense, after all if you have two things to focus on, you don't have a focus.

Jul 19 '18 20:07 kcc

Inspired by AUTOGRAM, I've realized that we could try generating protobuf descriptions based on DFSan traces.

Dec 27 '18 21:12 Dor1s

Temporary assigning to myself to do a very quick evaluation.

Apr 04 '19 17:04 Dor1s

If anyone wants to play locally:

Wait till https://reviews.llvm.org/rL208268 lands
Check out #2292 locally or wait until it lands too
Build stuff (if #2292 lands, you can do python infra/helper.py pull_images instead of re-building base images locally):

$ project=zlib  # or anything else, preferable small and written in C
$ python infra/helper.py build_image --no-pull base-clang \
    && python infra/helper.py build_image --no-pull base-builder \
    && python infra/helper.py build_image --no-pull $project \
    && python infra/helper.py build_fuzzers --engine dataflow --sanitizer dataflow $project

Apr 04 '19 19:04 Dor1s

50 projects succeeded to build:

gs://clusterfuzz-builds-dataflow/aosp/aosp-dataflow-201904091507.zip
gs://clusterfuzz-builds-dataflow/brotli/brotli-dataflow-201904091507.zip
gs://clusterfuzz-builds-dataflow/bzip2/bzip2-dataflow-201904091507.zip
gs://clusterfuzz-builds-dataflow/c-ares/c-ares-dataflow-201904091507.zip
gs://clusterfuzz-builds-dataflow/capstone/capstone-dataflow-201904091507.zip
gs://clusterfuzz-builds-dataflow/cmark/cmark-dataflow-201904091508.zip
gs://clusterfuzz-builds-dataflow/fuzzing-puzzles/fuzzing-puzzles-dataflow-201904091508.zip
gs://clusterfuzz-builds-dataflow/giflib/giflib-dataflow-201904091508.zip
gs://clusterfuzz-builds-dataflow/harfbuzz/harfbuzz-dataflow-201904091508.zip
gs://clusterfuzz-builds-dataflow/hoextdown/hoextdown-dataflow-201904091508.zip
gs://clusterfuzz-builds-dataflow/json-c/json-c-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/lcms/lcms-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/libchewing/libchewing-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/libexif/libexif-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/libgit2/libgit2-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/libidn2/libidn2-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/libldac/libldac-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libpcap/libpcap-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libplist/libplist-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libteken/libteken-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libtsm/libtsm-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libwebp/libwebp-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libyaml/libyaml-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/lzo/lzo-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/mbedtls/mbedtls-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/minizip/minizip-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/mupdf/mupdf-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/nestegg/nestegg-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/nghttp2/nghttp2-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/openjpeg/openjpeg-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/openthread/openthread-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/openvswitch/openvswitch-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/opus/opus-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/pcre2/pcre2-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/pffft/pffft-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/qcms/qcms-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/qpid-proton/qpid-proton-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/qubes-os/qubes-os-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/radare2/radare2-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/tpm2-tss/tpm2-tss-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/unicorn/unicorn-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/vorbis/vorbis-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/wolfssl/wolfssl-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/wuffs/wuffs-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/xz/xz-dataflow-201904091513.zip
gs://clusterfuzz-builds-dataflow/yajl-ruby/yajl-ruby-dataflow-201904091513.zip
gs://clusterfuzz-builds-dataflow/yara/yara-dataflow-201904091513.zip
gs://clusterfuzz-builds-dataflow/zlib-ng/zlib-ng-dataflow-201904091513.zip
gs://clusterfuzz-builds-dataflow/zlib/zlib-dataflow-201904091513.zip
gs://clusterfuzz-builds-dataflow/zstd/zstd-dataflow-201904091513.zip

Apr 10 '19 14:04 Dor1s

Hey @kcc, could you please check the log attached. The issue I'm seeing with the second target I'm testing is that the script quickly runs into ==59901==FATAL: DataFlowSanitizer: out of labels error (in this case after running first 16 inputs) and then it keeps trying the same input again and again with no luck. Am I doing anything wrong?

block_decompress.log

Apr 10 '19 16:04 Dor1s

If you need to reproduce:

Download gs://clusterfuzz-builds-dataflow/zstd/zstd-dataflow-201904091513.zip
Download gs://zstd-backup.clusterfuzz-external.appspot.com/corpus/libFuzzer/zstd_block_decompress/latest.zip
Unpack both, run block_decompress target

Apr 10 '19 16:04 Dor1s

Ah, I guess the real root cause is that some inputs are too long. What would be a good threshold to trim / ignore long ones?

Apr 10 '19 20:04 Dor1s

DFSan supports ~ 2^16 labels, but I would put a much lower threshold, e.g. 2^14 bytes for now. We can extend later at the cost of some (small) extra complexity. (I'll double-check what exactly is going on a bit later)

Apr 10 '19 20:04 kcc

I'm gonna try skipping such inputs in the script instead of retrying. That should make life much easier and all changes will live in LLVM repo (i.e. no hacky corpus trimming on user end).

Apr 10 '19 20:04 Dor1s

Yeah, https://reviews.llvm.org/D60538 seems to be a reasonable workaround for now.

Apr 10 '19 20:04 Dor1s

And now libFuzzer is crashing with the following stacktrace (looks like it tries to mutate an empty input, though there aren't empty inputs in the corpus):

asan_block_decompress: /src/libfuzzer/FuzzerMutate.cpp:510: size_t fuzzer::MutationDispatcher::MutateImpl(uint8_t *, size_t, size_t, Vector<fuzzer::MutationDispatcher::Mutator> &): Assertion `MaxSize > 0' failed.
==81393== ERROR: libFuzzer: deadly signal
    #0 0x4c0171 in __sanitizer_print_stack_trace /src/llvm/projects/compiler-rt/lib/asan/asan_stack.cc:86
    #1 0x69ecdd in fuzzer::PrintStackTrace() /src/libfuzzer/FuzzerUtil.cpp:205:5
    #2 0x652e5e in fuzzer::Fuzzer::CrashCallback() /src/libfuzzer/FuzzerLoop.cpp:234:3
    #3 0x7f2aee76a0bf  (/lib/x86_64-linux-gnu/libpthread.so.0+0x110bf)
    #4 0x7f2aeddc8fce in gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x32fce)
    #5 0x7f2aeddca3f9 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x343f9)
    #6 0x7f2aeddc1e36  (/lib/x86_64-linux-gnu/libc.so.6+0x2be36)
    #7 0x7f2aeddc1ee1 in __assert_fail (/lib/x86_64-linux-gnu/libc.so.6+0x2bee1)
    #8 0x68d567 in fuzzer::MutationDispatcher::MutateImpl(unsigned char*, unsigned long, unsigned long, std::__1::vector<fuzzer::MutationDispatcher::Mutator, fuzzer::fuzzer_allocator<fuzzer::MutationDispatcher::Mutator> >&) /src/libfuzzer/FuzzerMutate.cpp:510:3
    #9 0x68d92a in Mutate /src/libfuzzer/FuzzerMutate.cpp:498:10
    #10 0x68d92a in fuzzer::MutationDispatcher::MutateWithMask(unsigned char*, unsigned long, unsigned long, std::__1::vector<unsigned char, fuzzer::fuzzer_allocator<unsigned char> > const&) /src/libfuzzer/FuzzerMutate.cpp:546
    #11 0x658b33 in fuzzer::Fuzzer::MutateAndTestOne() /src/libfuzzer/FuzzerLoop.cpp:659:20
    #12 0x65bea8 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:814:5
    #13 0x6207b1 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:776:6
    #14 0x6131a7 in main /src/libfuzzer/FuzzerMain.cpp:19:10
    #15 0x7f2aeddb62b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
    #16 0x41d8e8 in _start (/usr/local/google/home/mmoroz/Downloads/df/asan_block_decompress+0x41d8e8)

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
MS: 0 ; base unit: 2a4e35072f5775415da786443a900821629e6c60

Apr 10 '19 20:04 Dor1s

Could be a trivial bug... Can you try this?

Index: FuzzerMutate.cpp
===================================================================
--- FuzzerMutate.cpp    (revision 358040)
+++ FuzzerMutate.cpp    (working copy)
@@ -542,6 +542,7 @@
     if (Mask[I])
       T[OneBits++] = Data[I];
 
+  if (!OneBits) return 0;
   assert(!T.empty());
   size_t NewSize = Mutate(T.data(), OneBits, OneBits);
   assert(NewSize <= OneBits);

Apr 11 '19 00:04 kcc

Thanks, @kcc! It helped with one more change, I've uploaded both in https://reviews.llvm.org/D60567

However, now I'm getting another crash (looks like the Mask is shorter than the input somehow):

supernew_asan_block_decompress: /src/libfuzzer/FuzzerMutate.cpp:532: size_t fuzzer::MutationDispatcher::MutateWithMask(uint8_t *, size_t, size_t, const Vector<uint8_t> &): Assertion `Size <= Mask.size()' failed.
==3743== ERROR: libFuzzer: deadly signal
    #0 0x4c0171 in __sanitizer_print_stack_trace /src/llvm/projects/compiler-rt/lib/asan/asan_stack.cc:86
    #1 0x69eccd in fuzzer::PrintStackTrace() /src/libfuzzer/FuzzerUtil.cpp:205:5
    #2 0x652e5e in fuzzer::Fuzzer::CrashCallback() /src/libfuzzer/FuzzerLoop.cpp:234:3
    #3 0x7f29962790bf  (/lib/x86_64-linux-gnu/libpthread.so.0+0x110bf)
    #4 0x7f29958d7fce in gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x32fce)
    #5 0x7f29958d93f9 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x343f9)
    #6 0x7f29958d0e36  (/lib/x86_64-linux-gnu/libc.so.6+0x2be36)
    #7 0x7f29958d0ee1 in __assert_fail (/lib/x86_64-linux-gnu/libc.so.6+0x2bee1)
    #8 0x68dc5d in fuzzer::MutationDispatcher::MutateWithMask(unsigned char*, unsigned long, unsigned long, std::__1::vector<unsigned char, fuzzer::fuzzer_allocator<unsigned char> > const&) /src/libfuzzer/FuzzerMutate.cpp:532:3
    #9 0x658b32 in fuzzer::Fuzzer::MutateAndTestOne() /src/libfuzzer/FuzzerLoop.cpp:659:20
    #10 0x65beb8 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:816:5
    #11 0x6207b1 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:776:6
    #12 0x6131a7 in main /src/libfuzzer/FuzzerMain.cpp:19:10
    #13 0x7f29958c52b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
    #14 0x41d8e8 in _start (/usr/local/google/home/mmoroz/projects/df/supernew_asan_block_decompress+0x41d8e8)

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
MS: 1 CMP- DE: "\x00\xc0\x00\x00\x00\x00\x00\x00"-; base unit: 57fb087984f90ba1677e27e934fb1ec989850df4

Corpus unit is 71 bytes and trace is 1867 bytes long:

$ ls -l block_decompress_corpus/57fb087984f90ba1677e27e934fb1ec989850df4 
-rw-r--r-- 1 mmoroz 71   Apr  9 19:14 block_decompress_corpus/57fb087984f90ba1677e27e934fb1ec989850df4
$ ls -l block_decompress_dft/57fb087984f90ba1677e27e934fb1ec989850df4 
-rw-r--r-- 1 mmoroz 1867 Apr 10 13:30 block_decompress_dft/57fb087984f90ba1677e27e934fb1ec989850df4

Apr 11 '19 15:04 Dor1s

Thanks Kostya for explaining some of the things in more detail. With one more change (https://reviews.llvm.org/D60571) I've got that fuzz target running locally!

Apr 11 '19 18:04 Dor1s

See below the difference in the disk space used by DataFlow traces vs corpus. Some targets are missing and some might be not fully correct as I ran out of disk space, but IMO it's safe to conclude 2-10x difference for most of the cases.:

   Ratio	 Corpus size	 DFT size      ./project/fuzz_target_name
----------------------------------------------------------------------------------------
    2.98x	 from 158M	 to 471M       for ./capstone/fuzz_disasmv4
    2.07x	 from 202M	 to 418M       for ./capstone/fuzz_disasmnext
    2.19x	 from 43M	 to 94M        for ./vorbis/decode_fuzzer
    3.25x	 from 24M	 to 78M        for ./wolfssl/pem_cert
    2.63x	 from 19M	 to 50M        for ./lcms/cmsIT8_load_fuzzer
    1.11x	 from 117M	 to 130M       for ./unicorn/fuzz_emu_x86_32
    1.37x	 from 106M	 to 145M       for ./unicorn/fuzz_emu_mips_32le
    2.30x	 from 133M	 to 306M       for ./unicorn/fuzz_emu_x86_64
    1.39x	 from 114M	 to 159M       for ./unicorn/fuzz_emu_mips_32be
    2.09x	 from 43M	 to 90M        for ./unicorn/fuzz_emu_arm_arm
    2.24x	 from 33M	 to 74M        for ./libexif/exif_loader_fuzzer
    9.39x	 from 120M	 to 1.1G       for ./aosp/sqlite
    2.00x	 from 4.0K	 to 8.0K       for ./hoextdown/hoedown_fuzzer
    5.39x	 from 399M	 to 2.1G       for ./mupdf/pdf_fuzzer
    5.06x	 from 65M	 to 329M       for ./qpid-proton/fuzz-message-decode
    1.07x	 from 134M	 to 143M       for ./openjpeg/opj_decompress_fuzzer
    3.50x	 from 20M	 to 70M        for ./mbedtls/fuzz_x509crl
    4.58x	 from 36M	 to 165M       for ./mbedtls/fuzz_dtlsserver
   14.57x	 from 14M	 to 204M       for ./mbedtls/fuzz_x509csr
    6.47x	 from 19M	 to 123M       for ./mbedtls/fuzz_pubkey
    2.27x	 from 11M	 to 25M        for ./mbedtls/fuzz_privkey
    3.00x	 from 4.0K	 to 12K        for ./fuzzing-puzzles/multiple_constraints_on_small_input_afl_fuzzer
   16.93x	 from 127M	 to 2.1G       for ./radare2/ia_fuzz
    1.71x	 from 17M	 to 29M        for ./opus/opus_decode_fuzzer_fixed
    1.68x	 from 19M	 to 32M        for ./opus/opus_decode_fuzzer_floating
    1.33x	 from 12K	 to 16K        for ./zlib/example_flush_fuzzer
    1.03x	 from 33M	 to 34M        for ./zlib/example_dict_fuzzer
    1.55x	 from 53M	 to 82M        for ./yara/rules_fuzzer
 2560.00x	 from 8.0K	 to 20M        for ./yara/macho_fuzzer
   22.08x	 from 25M	 to 552M       for ./openthread/radio-receive-done-fuzzer
    1.43x	 from 21M	 to 30M        for ./openthread/ip6-send-fuzzer
    4.69x	 from 68M	 to 319M       for ./openthread/cli-uart-received-fuzzer
    3.04x	 from 4.6M	 to 14M        for ./c-ares/ares_create_query_fuzzer
    2.03x	 from 5.9M	 to 12M        for ./libidn2/libidn2_to_unicode_8z8z_fuzzer
    1.22x	 from 49M	 to 60M        for ./libpcap/fuzz_both
    3.00x	 from 4.0K	 to 12K        for ./libchewing/chewing_default_fuzzer
    3.00x	 from 4.0K	 to 12K        for ./libchewing/chewing_random_init_fuzzer
    3.00x	 from 4.0K	 to 12K        for ./libchewing/chewing_dynamic_config_fuzzer
    1.26x	 from 50M	 to 63M        for ./libwebp/fuzz_simple_api
    2.80x	 from 20M	 to 56M        for ./libwebp/fuzz_webp_enc_dec
    1.78x	 from 23M	 to 41M        for ./libwebp/fuzz_webp_animencoder
    1.56x	 from 390M	 to 608M       for ./harfbuzz/hb-shape-fuzzer
    4.29x	 from 152M	 to 652M       for ./harfbuzz/hb-subset-fuzzer
    2.97x	 from 36M	 to 107M       for ./nghttp2/nghttp2_fuzzer
    1.63x	 from 263M	 to 430M       for ./cmark/cmark_fuzzer
62976.00x	 from 4.0K	 to 246M       for ./zlib-ng/compress_fuzzer
    1.39x	 from 46M	 to 64M        for ./zlib-ng/example_dict_fuzzer
    1.88x	 from 32M	 to 60M        for ./zstd/simple_decompress
    2.21x	 from 169M	 to 373M       for ./zstd/stream_round_trip
    5.90x	 from 29M	 to 171M       for ./zstd/stream_decompress
    1.29x	 from 17M	 to 22M        for ./zstd/block_decompress
    2.04x	 from 78M	 to 159M       for ./zstd/block_round_trip

Apr 11 '19 19:04 Dor1s

Nice, thanks! I didn't even try to optimize the disk size yet, wanted to see if the logic works at all. I think the easiest way to optimize the disk space is to zlib-compress the data.

Apr 11 '19 19:04 kcc

Just added -focus_function=auto which will make libFuzzer choose the focus function automatically based on the coverage data contained in the trace files.

So far tested only on a tiny test.

I will keep testing and tuning it, but the basic functionality is there.

May 09 '19 21:05 kcc

I've reimplemented the python scripts in libFuzzer proper (LLVM r360712).

The current work flow:

#!/bin/bash
LLVM=$HOME/llvm
RT=$LLVM/projects/compiler-rt
# Build the regular fuzzer binary.
clang -g -O1 -fsanitize=fuzzer $RT/test/fuzzer/OnlySomeBytesTest.cpp -o fuzzer-lf
# Build the DFT binary.
clang -c  -fsanitize=dataflow $RT/lib/fuzzer/dataflow/DataFlow.cpp
clang -g -fsanitize=dataflow -fsanitize-coverage=trace-pc-guard,pc-table,bb,trace-cmp  \
    $RT/test/fuzzer/OnlySomeBytesTest.cpp DataFlow.o -o fuzzer-dft

# create the corpus
rm -rf CORPUS && mkdir CORPUS
(echo -n ABC; for((i=0;i<4093;i++)) ; do echo -n x; done) > CORPUS/seed
./fuzzer-lf CORPUS/ -use_value_profile=1 -runs=1000000 # Very unlikely to find the bug.

# create_dft()
rm -rf DFT && ./fuzzer-lf -collect_data_flow=./fuzzer-dft -data_flow_trace=DFT CORPUS

# Use DFT. This should find the bug almost instantly.
rm -rf C2; mkdir C2
./fuzzer-lf C2 CORPUS/ -use_value_profile=1 -data_flow_trace=DFT \
  -focus_function=auto -jobs=20 -artifact_prefix=C2/```

I have not tested this on anything real yet, only on the above synthetic puzzle.

May 14 '19 21:05 kcc

I've tried to build all the projects once again (in order to have a better sampling of the builds and choose only stable ones for the experiment), and this time only 4 project builds succeeded:

$ gsutil ls -r gs://clusterfuzz-builds-dataflow/ | egrep 20190517
gs://clusterfuzz-builds-dataflow/c-ares/c-ares-dataflow-201905171213.srcmap.json
gs://clusterfuzz-builds-dataflow/c-ares/c-ares-dataflow-201905171213.zip
gs://clusterfuzz-builds-dataflow/radare2/radare2-dataflow-201905171217.srcmap.json
gs://clusterfuzz-builds-dataflow/radare2/radare2-dataflow-201905171217.zip
gs://clusterfuzz-builds-dataflow/zlib-ng/zlib-ng-dataflow-201905171217.srcmap.json
gs://clusterfuzz-builds-dataflow/zlib-ng/zlib-ng-dataflow-201905171217.zip
gs://clusterfuzz-builds-dataflow/zlib/zlib-dataflow-201905171217.srcmap.json
gs://clusterfuzz-builds-dataflow/zlib/zlib-dataflow-201905171217.zip

Checking the logs... Maybe recent migration broke the others.

May 20 '19 14:05 Dor1s

Yeah, with a newer version of #2303 I'm able to build many projects again.

May 20 '19 23:05 Dor1s

41 projects which we should try DFT-based fuzzing on:

aosp
brotli
bzip2
capstone
cmark
giflib
harfbuzz
hoextdown
lcms
libchewing
libexif
libgit2
libidn2
libldac
libpcap
libplist
libteken
libtsm
libwebp
libyaml
lzo
mbedtls
minizip
mupdf
nestegg
nghttp2
openjpeg
openthread
openvswitch
opus
pcre2
pffft
qcms
radare2
vorbis
wolfssl
wuffs
xz
yara
zlib
zstd

May 21 '19 13:05 Dor1s

New workflow:

build the two binaries as above
run the libFuzzer binary with -fork=N and -collect_data_flow=<DFT_BINARY>

./fuzzer-lf -use_value_profile=1 -collect_data_flow=./fuzzer-dft -fork=20

(again, not tested yet outside of tiny examples)

With this workflow we may not need any kind of DFT management from ClusterFuzz -- just let CF ship both binary (libFuzzer and DFT) to a worker and invoke the fuzzer binary with -collect_data_flow=./DFT -fork=1

May 23 '19 00:05 kcc

Ack. I've just got a null deref in libFuzzer locally, but I think it has something to do with the way things are getting built now (i.e. -fsanitize=fuzzer uses LLVM that is more than a week old and doesn't have your DFT changes):

$ asan/zlib_uncompress_fuzzer -use_value_profile=1 -collect_data_flow=dfsan/zlib_uncompress_fuzzer -print_final_stats=1 -max_total_time=3600 -timeout=25 corpus/df_new corpus/new/ corpus/cf/                                                                                      
INFO: Seed: 2102970602
AddressSanitizer:DEADLYSIGNAL
=================================================================                                       
==228104==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f0ba6dbd646 bp 0x7ffff27e9370 sp 0x7ffff27e8b28 T0)
==228104==The signal is caused by a READ memory access.                                                 
==228104==Hint: address points to the zero page.
    #0 0x7f0ba6dbd645 in strlen (/lib/x86_64-linux-gnu/libc.so.6+0x80645)                               
    #1 0x4d2758 in __interceptor_strlen /src/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc
    #2 0x46364b in length /work/llvm-stage2/projects/compiler-rt/lib/fuzzer/libcxx_fuzzer_x86_64/include/c++/v1/__string:217:53
    #3 0x46364b in basic_string /work/llvm-stage2/projects/compiler-rt/lib/fuzzer/libcxx_fuzzer_x86_64/include/c++/v1/string:821
    #4 0x46364b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:718                                              
    #5 0x48bad2 in main /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerMain.cpp:19:10                  
    #6 0x7f0ba6d5d2b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)                    
    #7 0x41dad8 in _start (/usr/local/google/home/mmoroz/projects/dataflow/zlib/asan/zlib_uncompress_fuzzer+0x41dad8)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib/x86_64-linux-gnu/libc.so.6+0x80645) in strlen                     
==228104==ABORTING

Need to think of a convenient way of using a ToT. @jonathanmetzman do you have any suggestions?

I obviously can bump and force LLVM revision locally, and avoid pulling the images from OSS-Fuzz, but that seems a bit too heavyweight.

May 29 '19 14:05 Dor1s

Maybe change $LIB_FUZZING_ENGINE to /path/tolibFuzzingEngine.a in dataflow sanitizer builds?

May 29 '19 14:05 jonathanmetzman

In dataflow builds LIB_FUZZING_ENGINE is pointing to DataFlow.o -- it doesn't use libFuzzer. I need to hack --engine libfuzzer build. Others may need to do it as well from time to time (e.g. you or an intern testing something new, Matt fixing something upstream, etc), so I'm thinking maybe we should add some extra flag or libfuzzer-tot engine option.

May 29 '19 14:05 Dor1s

For now, bumped LLVM to r361579 locally. The crash reproduced anyway, probably because I didn't use -fork= mode:

  if (Flags.collect_data_flow && !Flags.fork && !Flags.merge) {
    if (RunIndividualFiles)
      return CollectDataFlow(Flags.collect_data_flow, Flags.data_flow_trace,
                        ReadCorpora({}, *Inputs));
    else
      return CollectDataFlow(Flags.collect_data_flow, Flags.data_flow_trace,  // :720, crash here
                        ReadCorpora(*Inputs, {}));
  }

stacktrace isn't super helpful, CC @kcc maybe you can quickly realize what's wrong:

    #0 0x7fc523c6a645 in strlen (/lib/x86_64-linux-gnu/libc.so.6+0x80645)
    #1 0x4d3548 in __interceptor_strlen /src/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc
    #2 0x463f6d in length /work/llvm-stage2/projects/compiler-rt/lib/fuzzer/libcxx_fuzzer_x86_64/include/c++/v1/__string:217:53
    #3 0x463f6d in basic_string /work/llvm-stage2/projects/compiler-rt/lib/fuzzer/libcxx_fuzzer_x86_64/include/c++/v1/string:821
    #4 0x463f6d in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:720
    #5 0x48c8c2 in main /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerMain.cpp:19:10
    #6 0x7fc523c0a2b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
    #7 0x41db28 in _start (/usr/local/google/home/mmoroz/projects/dataflow/zlib/asan/zlib_uncompress_fuzzer+0x41db28)

With fork mode enabled it seems to be running, so I'll report back on the progress.

May 29 '19 15:05 Dor1s

zlib_uncompress_fuzzer, -fork=1, 1 hour:

$ asan/zlib_uncompress_fuzzer -use_value_profile=1 -collect_data_flow=dfsan/zlib_uncompress_fuzzer -print_final_stats=1 -max_total_time=3600 -fork=1 -timeout=25 corpus/df_
new corpus/new/ corpus/cf/                                                                              
INFO: Seed: 1493609206
INFO: Loaded 1 modules   (664 inline 8-bit counters): 664 [0x7e5af0, 0x7e5d88),                         
INFO: Loaded 1 PC tables (664 PCs): 664 [0x5a0878,0x5a31f8),                                            
INFO: -fork=1: fuzzing in separate process(s)
INFO: -fork=1: 1428 seed inputs, starting to fuzz in /tmp/libFuzzerTemp.59644.dir                       
INFO: fuzzed for 3658 seconds, wrapping up soon
INFO: exiting: 0 time: 3658s

The output directory is empty, and I don't have any logs to look at. I've been checking temp logs occasionally to make sure things were running.

I know that my setup works though, because when I tried running over a corpus subset, I've got new units written. The output in that case looks similar to the output of a regular -fork mode run, so I think we shouldn't miss any stats.

May 29 '19 17:05 Dor1s

@kcc, another question for you: how do I see how much time is spent on collecting DFT? I'm just worried that if I enable it in current CF configuration, we'll be fuzzing up to ~44 minutes each run, and I don't want to spend too much time on collecting the traces.

May 29 '19 17:05 Dor1s

Proposal: DFT-based fuzzing

Data Flow Trace

Collecting the DFT

Using the DFT

Example