fuzzbench icon indicating copy to clipboard operation
fuzzbench copied to clipboard

Sampling corpora

Open dylanjwolff opened this issue 2 years ago • 13 comments

Hi!

The purpose of this PR is to allow users to easily "sample" initial corpora per trial from a larger pool of seeds. This should help to mitigate some possible issues with overfitting / overtraining to the static seed corpus of a particular benchmark program. Additionally, it allows us to begin to quantify the impact of various changes in the composition of the starting corpus on a fuzzer / campaign (https://github.com/google/fuzzbench/issues/1489).

All of these changes are opt-in and gated behind a dict in the experiment configuration YAML file. Happy to discuss if the configuration options should be restructured or reworded. I've attached an example config file for a local experiment:

example-config.yaml.txt

In terms of the changes themselves, the sampling is done in each runner container / process deterministically based on a random seed. For each runner, the unsampled files are removed from the $SEED_CORPUS_DIR before starting the fuzzer. The corpora are sampled in a "matched pairs" style design, so the 1st trial of AFL on a given target has the same corpus as the 1st trial of libfuzzer on that target and so on.

dylanjwolff avatar Mar 17 '23 11:03 dylanjwolff

Hi @Alan32Liu, this is the PR related to the https://arxiv.org/abs/2212.09519 we talked about. Can you have a look?

mboehme avatar May 02 '23 13:05 mboehme

Hi @Alan32Liu, this is the PR related to the https://arxiv.org/abs/2212.09519 we talked about. Can you have a look?

Sure! I will read the code today. Would you prefer to merge this PR to master, or only run experiments in this PR?

DonggeLiu avatar May 02 '23 21:05 DonggeLiu

Thanks for the feedback @Alan32Liu and @jonathanmetzman! I'll start updating the PR tomorrow

Would you prefer to merge this PR to master, or only run experiments in this PR?

I'm hoping to land this in master -- I think it will be a useful feature for the platform going forwards

dylanjwolff avatar May 03 '23 15:05 dylanjwolff

@dylanjwolff Would you like me to launch a test experiment if this PR is ready? If so, could you please:

  1. List the fuzzers, benchmarks, and the new flags for the experiment.
  2. Add the new configuration file to service/ under a different name (e.g., experiment-config-sampling.yaml).
  3. Make a trivial modification to service/gcbrun_experiment.py. This will allow me to launch experiments in this PR before merging. Here is an example.

Thanks!

DonggeLiu avatar May 05 '23 07:05 DonggeLiu

Thanks @Alan32Liu!

I've added the YAML file and modified gcbrun. I wasn't sure what the default options should be for various parameters, so I just used one of the other YAML files in /service.

I don't have a preference for which fuzzers, any set of two or more should be fine. No other configuration flags etc. should be needed.

I've run a number of local experiments and everything works fine so I don't expect any issues.

dylanjwolff avatar May 05 '23 11:05 dylanjwolff

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/seed-sampling-example-config.yaml --experiment-name 2023-05-05-sample --fuzzers libfuzzer aflplusplus afl centipede honggfuzz --benchmarks bloaty_fuzz_target curl_curl_fuzzer_http freetype2_ftfuzzer harfbuzz_hb-shape-fuzzer jsoncpp_jsoncpp_fuzzer lcms_cms_transform_fuzzer libjpeg-turbo_libjpeg_turbo_fuzzer libpcap_fuzz_both libpng_libpng_read_fuzzer libxml2_xml libxslt_xpath openh264_decoder_fuzzer openssl_x509 openthread_ot-ip6-send-fuzzer proj4_proj_crs_to_crs_fuzzer re2_fuzzer sqlite3_ossfuzz stb_stbi_read_fuzzer systemd_fuzz-link-parser vorbis_decode_fuzzer woff2_convert_woff2ttf_fuzzer

DonggeLiu avatar May 05 '23 11:05 DonggeLiu

Thanks for the prompt response @dylanjwolff ! The experiment has been launched, I will come back later to check if everything goes well and add the link to the experiment data and report when they will have been created.

DonggeLiu avatar May 05 '23 11:05 DonggeLiu

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/seed-sampling-example-config.yaml --experiment-name 2023-05-06-sample --fuzzers libfuzzer aflplusplus afl centipede honggfuzz --benchmarks bloaty_fuzz_target curl_curl_fuzzer_http freetype2_ftfuzzer harfbuzz_hb-shape-fuzzer jsoncpp_jsoncpp_fuzzer lcms_cms_transform_fuzzer libjpeg-turbo_libjpeg_turbo_fuzzer libpcap_fuzz_both libpng_libpng_read_fuzzer libxml2_xml libxslt_xpath openh264_decoder_fuzzer openssl_x509 openthread_ot-ip6-send-fuzzer proj4_proj_crs_to_crs_fuzzer re2_fuzzer sqlite3_ossfuzz stb_stbi_read_fuzzer systemd_fuzz-link-parser vorbis_decode_fuzzer woff2_convert_woff2ttf_fuzzer

DonggeLiu avatar May 06 '23 05:05 DonggeLiu

The 45-minute test run looks fine. Shall we merge this (to the master branch or an individual branch)? @dylanjwolff @jonathanmetzman

DonggeLiu avatar May 08 '23 00:05 DonggeLiu

Shall we merge this (to the master branch or an individual branch)?

I vote master!

I think it's an important feature and it's opt-in, so it shouldn't disrupt anyone's current workflow. At some point maybe there is even a discussion to be had about whether or not it should be the default, but I think at least making it an option for everyone is a good idea

dylanjwolff avatar May 08 '23 02:05 dylanjwolff

@jonathanmetzman Any thoughts on getting this merged into master?

dylanjwolff avatar May 12 '23 06:05 dylanjwolff