fuzzbench Sampling corpora

Hi!

The purpose of this PR is to allow users to easily "sample" initial corpora per trial from a larger pool of seeds. This should help to mitigate some possible issues with overfitting / overtraining to the static seed corpus of a particular benchmark program. Additionally, it allows us to begin to quantify the impact of various changes in the composition of the starting corpus on a fuzzer / campaign (https://github.com/google/fuzzbench/issues/1489).

All of these changes are opt-in and gated behind a dict in the experiment configuration YAML file. Happy to discuss if the configuration options should be restructured or reworded. I've attached an example config file for a local experiment:

example-config.yaml.txt

In terms of the changes themselves, the sampling is done in each runner container / process deterministically based on a random seed. For each runner, the unsampled files are removed from the $SEED_CORPUS_DIR before starting the fuzzer. The corpora are sampled in a "matched pairs" style design, so the 1st trial of AFL on a given target has the same corpus as the 1st trial of libfuzzer on that target and so on.

Mar 17 '23 11:03 dylanjwolff

Hi @Alan32Liu, this is the PR related to the https://arxiv.org/abs/2212.09519 we talked about. Can you have a look?

May 02 '23 13:05 mboehme

Hi @Alan32Liu, this is the PR related to the https://arxiv.org/abs/2212.09519 we talked about. Can you have a look?

Sure! I will read the code today. Would you prefer to merge this PR to master, or only run experiments in this PR?

May 02 '23 21:05 DonggeLiu

Thanks for the feedback @Alan32Liu and @jonathanmetzman! I'll start updating the PR tomorrow

Would you prefer to merge this PR to master, or only run experiments in this PR?

I'm hoping to land this in master -- I think it will be a useful feature for the platform going forwards

May 03 '23 15:05 dylanjwolff

@dylanjwolff Would you like me to launch a test experiment if this PR is ready? If so, could you please:

List the fuzzers, benchmarks, and the new flags for the experiment.
Add the new configuration file to service/ under a different name (e.g., experiment-config-sampling.yaml).
Make a trivial modification to service/gcbrun_experiment.py. This will allow me to launch experiments in this PR before merging. Here is an example.

Thanks!

May 05 '23 07:05 DonggeLiu

Thanks @Alan32Liu!

I've added the YAML file and modified gcbrun. I wasn't sure what the default options should be for various parameters, so I just used one of the other YAML files in /service.

I don't have a preference for which fuzzers, any set of two or more should be fine. No other configuration flags etc. should be needed.

I've run a number of local experiments and everything works fine so I don't expect any issues.

May 05 '23 11:05 dylanjwolff

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/seed-sampling-example-config.yaml --experiment-name 2023-05-05-sample --fuzzers libfuzzer aflplusplus afl centipede honggfuzz --benchmarks bloaty_fuzz_target curl_curl_fuzzer_http freetype2_ftfuzzer harfbuzz_hb-shape-fuzzer jsoncpp_jsoncpp_fuzzer lcms_cms_transform_fuzzer libjpeg-turbo_libjpeg_turbo_fuzzer libpcap_fuzz_both libpng_libpng_read_fuzzer libxml2_xml libxslt_xpath openh264_decoder_fuzzer openssl_x509 openthread_ot-ip6-send-fuzzer proj4_proj_crs_to_crs_fuzzer re2_fuzzer sqlite3_ossfuzz stb_stbi_read_fuzzer systemd_fuzz-link-parser vorbis_decode_fuzzer woff2_convert_woff2ttf_fuzzer

May 05 '23 11:05 DonggeLiu

Thanks for the prompt response @dylanjwolff ! The experiment has been launched, I will come back later to check if everything goes well and add the link to the experiment data and report when they will have been created.

May 05 '23 11:05 DonggeLiu

Experiment data. Experiment report.

May 05 '23 12:05 DonggeLiu

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/seed-sampling-example-config.yaml --experiment-name 2023-05-06-sample --fuzzers libfuzzer aflplusplus afl centipede honggfuzz --benchmarks bloaty_fuzz_target curl_curl_fuzzer_http freetype2_ftfuzzer harfbuzz_hb-shape-fuzzer jsoncpp_jsoncpp_fuzzer lcms_cms_transform_fuzzer libjpeg-turbo_libjpeg_turbo_fuzzer libpcap_fuzz_both libpng_libpng_read_fuzzer libxml2_xml libxslt_xpath openh264_decoder_fuzzer openssl_x509 openthread_ot-ip6-send-fuzzer proj4_proj_crs_to_crs_fuzzer re2_fuzzer sqlite3_ossfuzz stb_stbi_read_fuzzer systemd_fuzz-link-parser vorbis_decode_fuzzer woff2_convert_woff2ttf_fuzzer

May 06 '23 05:05 DonggeLiu

05-06: Experiment data. Experiment report.

May 06 '23 05:05 DonggeLiu

The 45-minute test run looks fine. Shall we merge this (to the master branch or an individual branch)? @dylanjwolff @jonathanmetzman

May 08 '23 00:05 DonggeLiu

Shall we merge this (to the master branch or an individual branch)?

I vote master!

I think it's an important feature and it's opt-in, so it shouldn't disrupt anyone's current workflow. At some point maybe there is even a discussion to be had about whether or not it should be the default, but I think at least making it an option for everyone is a good idea

May 08 '23 02:05 dylanjwolff

@jonathanmetzman Any thoughts on getting this merged into master?

May 12 '23 06:05 dylanjwolff

fuzzbench fuzzbench copied to clipboard

Sampling corpora

fuzzbench
fuzzbench copied to clipboard