clusterfuzz icon indicating copy to clipboard operation
clusterfuzz copied to clipboard

ClusterFuzz adds testcases to the corpus that weren't produced by the fuzzer

Open jonathanmetzman opened this issue 4 years ago • 1 comments

@afd reports the following to me:

I have a fuzzer that uses a custom mutator, and it relies on the fact that no other mutators run - i.e., it starts with a corpus of valid inputs, and then does mutatoins that preserve their validity. If some other mutator would run and do some random change that would spoil this property
...
I'm getting results from ClusterFuzz that suggest that other mutations may be occurring, though.
...
/mnt/scratch0/clusterfuzz/bot/builds/chromium-browser-libfuzzer_linux-release-asan_ae530a86793cd6b8b56ce9af9159ac101396e802/revisions/libfuzzer-linux-release-903197/tint_spirv_tools_hlsl_writer_fuzzer -cross_over=0 -max_len=1000000 -mutate_depth=1 -tint_enable_all_fuzzer_passes=true -tint_enable_all_reduce_passes=true -tint_mutator_cache_size=30 -tint_mutator_type=fuzz,opt,reduce -tint_opt_batch_size=5 -tint_reduction_batch_size=5 -tint_repeated_pass_strategy=looped -tint_transformation_batch_size=5 -tint_fuzzing_target=hlsl -timeout=25 -rss_limit_mb=2560 -use_value_profile=1 -artifact_prefix=/mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases/ -max_total_time=2080 -print_final_stats=1 /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases-disk/temp-480/new /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases-disk/temp-480/mutations /mnt/scratch0/clusterfuzz/bot/inputs/data-bundles/tint_spirv_tools_hlsl_writer_fuzzer

As @afd correctly guessed /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases-disk/temp-480/mutations contains testcases not produced by the fuzzer. I think there are a few places ClusterFuzz can do this.

  1. If your fuzzer is built with an engine (e.g. vanilla AFL in Chromium) that won't use your custom mutator. This should be fixed when we switch to AFL++ but isn't really the fault of ClusterFuzz, more the fault of the build system.
  2. Corpus pollination.
  3. MLRNN mutator.
  4. Radamsa.

I see 2 solutions to this problem.

  1. Add some kind of option where so a fuzzer can tell CF not to do this.
  2. Change the fuzzer so that it automatically rejects invalid testcases. libprotobuf-mutator essentially does this. When MLRNN say creates an invalid proto testcase, LPM will simply reject it because it can't be deserialized properly. One way this can be done is through a checksum.

Solution 1 seems a bit error prone and ugly. Solution 2 is ugly for users, maintainers viewing the testcases will no longer be looking at straight PNG files but instead at some custom format just to avoid ClusterFuzz's shenanigans.

@oliverchang WDYT?

jonathanmetzman avatar Jul 23 '21 14:07 jonathanmetzman

Yeah, 1. is too brittle as you mention. The idea that we can add anything to the corpus if it contributes new coverage is baked in several places.

Asking fuzz targets to validate inputs seems reasonable to me:

  • This requirement seems quite rare.
  • Its questionable why we only want to feed "valid" inputs according to the mutator. If invalid inputs cause a lot of different new coverage to be explored, we probably want this.
  • Doing this in the target seems like good practice anyway. Otherwise if the format changes at some point it can never recover.

oliverchang avatar Jul 26 '21 05:07 oliverchang