fuzzbench
fuzzbench copied to clipboard
Sampling corpora
Hi!
The purpose of this PR is to allow users to easily "sample" initial corpora per trial from a larger pool of seeds. This should help to mitigate some possible issues with overfitting / overtraining to the static seed corpus of a particular benchmark program. Additionally, it allows us to begin to quantify the impact of various changes in the composition of the starting corpus on a fuzzer / campaign (https://github.com/google/fuzzbench/issues/1489).
All of these changes are opt-in and gated behind a dict in the experiment configuration YAML file. Happy to discuss if the configuration options should be restructured or reworded. I've attached an example config file for a local experiment:
In terms of the changes themselves, the sampling is done in each runner container / process deterministically based on a random seed. For each runner, the unsampled files are removed from the $SEED_CORPUS_DIR before starting the fuzzer. The corpora are sampled in a "matched pairs" style design, so the 1st trial of AFL on a given target has the same corpus as the 1st trial of libfuzzer on that target and so on.
Hi @Alan32Liu, this is the PR related to the https://arxiv.org/abs/2212.09519 we talked about. Can you have a look?
Hi @Alan32Liu, this is the PR related to the https://arxiv.org/abs/2212.09519 we talked about. Can you have a look?
Sure! I will read the code today. Would you prefer to merge this PR to master, or only run experiments in this PR?
Thanks for the feedback @Alan32Liu and @jonathanmetzman! I'll start updating the PR tomorrow
Would you prefer to merge this PR to master, or only run experiments in this PR?
I'm hoping to land this in master -- I think it will be a useful feature for the platform going forwards
@dylanjwolff Would you like me to launch a test experiment if this PR is ready? If so, could you please:
- List the fuzzers, benchmarks, and the new flags for the experiment.
- Add the new configuration file to
service/under a different name (e.g.,experiment-config-sampling.yaml). - Make a trivial modification to service/gcbrun_experiment.py. This will allow me to launch experiments in this PR before merging. Here is an example.
Thanks!
Thanks @Alan32Liu!
I've added the YAML file and modified gcbrun. I wasn't sure what the default options should be for various parameters, so I just used one of the other YAML files in /service.
I don't have a preference for which fuzzers, any set of two or more should be fine. No other configuration flags etc. should be needed.
I've run a number of local experiments and everything works fine so I don't expect any issues.
/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/seed-sampling-example-config.yaml --experiment-name 2023-05-05-sample --fuzzers libfuzzer aflplusplus afl centipede honggfuzz --benchmarks bloaty_fuzz_target curl_curl_fuzzer_http freetype2_ftfuzzer harfbuzz_hb-shape-fuzzer jsoncpp_jsoncpp_fuzzer lcms_cms_transform_fuzzer libjpeg-turbo_libjpeg_turbo_fuzzer libpcap_fuzz_both libpng_libpng_read_fuzzer libxml2_xml libxslt_xpath openh264_decoder_fuzzer openssl_x509 openthread_ot-ip6-send-fuzzer proj4_proj_crs_to_crs_fuzzer re2_fuzzer sqlite3_ossfuzz stb_stbi_read_fuzzer systemd_fuzz-link-parser vorbis_decode_fuzzer woff2_convert_woff2ttf_fuzzer
Thanks for the prompt response @dylanjwolff ! The experiment has been launched, I will come back later to check if everything goes well and add the link to the experiment data and report when they will have been created.
/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/seed-sampling-example-config.yaml --experiment-name 2023-05-06-sample --fuzzers libfuzzer aflplusplus afl centipede honggfuzz --benchmarks bloaty_fuzz_target curl_curl_fuzzer_http freetype2_ftfuzzer harfbuzz_hb-shape-fuzzer jsoncpp_jsoncpp_fuzzer lcms_cms_transform_fuzzer libjpeg-turbo_libjpeg_turbo_fuzzer libpcap_fuzz_both libpng_libpng_read_fuzzer libxml2_xml libxslt_xpath openh264_decoder_fuzzer openssl_x509 openthread_ot-ip6-send-fuzzer proj4_proj_crs_to_crs_fuzzer re2_fuzzer sqlite3_ossfuzz stb_stbi_read_fuzzer systemd_fuzz-link-parser vorbis_decode_fuzzer woff2_convert_woff2ttf_fuzzer
05-06: Experiment data. Experiment report.
The 45-minute test run looks fine. Shall we merge this (to the master branch or an individual branch)? @dylanjwolff @jonathanmetzman
Shall we merge this (to the master branch or an individual branch)?
I vote master!
I think it's an important feature and it's opt-in, so it shouldn't disrupt anyone's current workflow. At some point maybe there is even a discussion to be had about whether or not it should be the default, but I think at least making it an option for everyone is a good idea
@jonathanmetzman Any thoughts on getting this merged into master?