fuzzbench icon indicating copy to clipboard operation
fuzzbench copied to clipboard

SBFT24 Competition

Open phi-go opened this issue 1 year ago • 42 comments

This PR combines all fuzzers submitted to SBFT24 and the mutation measurer to allow experiments for the competition.

phi-go avatar Jan 08 '24 08:01 phi-go

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

google-cla[bot] avatar Jan 08 '24 08:01 google-cla[bot]

Hey, @Alan32Liu, we would be ready to start experiments. Probably best to start at least the first run with a single benchmark.

The following benchmarks should work, I would start with a smaller one like jsoncpp. zlib_zlib_uncompress_fuzzer, stb_stbi_read_fuzzer, vorbis_decode_fuzzer, jsoncpp_jsoncpp_fuzzer, libpcap_fuzz_both, lcms_cms_transform_fuzzer, libxml2_xml, freetype2_ftfuzzer

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name sbft24-jsoncpp --fuzzers aflplusplus libfuzzer mystique pastis bandfuzz tunefuzz fox --benchmarks jsoncpp_jsoncpp_fuzzer --mutation-analysis

Also I can't seem to add a tag for this PR or I would add the SBFT one.

phi-go avatar Jan 13 '24 19:01 phi-go

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name sbft24-jsoncpp-01-14 --fuzzers aflplusplus libfuzzer mystique pastis bandfuzz tunefuzz fox --benchmarks jsoncpp_jsoncpp_fuzzer --mutation-analysis

DonggeLiu avatar Jan 14 '24 04:01 DonggeLiu

Experiment sbft24-jsoncpp-01-14 data and results will be available later at: The experiment data. The experiment report.

DonggeLiu avatar Jan 14 '24 04:01 DonggeLiu

@Alan32Liu do you know if there a known problem with aflplusplus? We hit the following exception in two separate experiments:

ERROR:root:Error occurred when generating coverage report. Extras: 
    traceback: Traceback (most recent call last):
  File "/work/src/experiment/measurer/coverage_utils.py", line 74, in generate_coverage_report
    coverage_reporter.generate_coverage_summary_json()
  File "/work/src/experiment/measurer/coverage_utils.py", line 141, in generate_coverage_summary_json
    result = generate_json_summary(coverage_binary,
  File "/work/src/experiment/measurer/coverage_utils.py", line 280, in generate_json_summary
    with open(output_file, 'w', encoding='utf-8') as dst_file:
FileNotFoundError: [Errno 2] No such file or directory: '/work/measurement-folders/jsoncpp_jsoncpp_fuzzer-aflplusplus/merged.json'

phi-go avatar Jan 14 '24 07:01 phi-go

@Alan32Liu do you know if there a known problem with aflplusplus? We hit the following exception in two separate experiments:

Not that I know of. Would you like to test out a different version?

I believe this is the last time it was updated: https://github.com/google/fuzzbench/pull/1936

DonggeLiu avatar Jan 14 '24 08:01 DonggeLiu

Oh, actually there is a build failure for afl++, build-log. Even for the run you started yesterday. Seems like apt update is not part of the dockerfile. So a older version probably won't help.

phi-go avatar Jan 15 '24 08:01 phi-go

@Alan32Liu Would it be possible to start a experiment for all the default benchmarks without using mutation testing as well? Would be nice to have the default experiments at least, as we support only a limited set of benchmarks and sadly won't have time to really expand that until the deadline. Though, I need to fix something before we can do that using this branch.

phi-go avatar Jan 15 '24 08:01 phi-go

@Alan32Liu Would it be possible to start a experiment for all the default benchmarks without using mutation testing as well? Would be nice to have the default experiments at least, as we support only a limited set of benchmarks and sadly won't have time to really expand that until the deadline. Though, I need to fix something before we can do that using this branch.

Sure, we have a pretty recent one at https://github.com/google/fuzzbench/pull/1929#issuecomment-1849164925. Will run another one in this PR and at https://github.com/google/fuzzbench/pull/1945 as a comparison.

DonggeLiu avatar Jan 15 '24 09:01 DonggeLiu

Could you please set this to false? https://github.com/google/fuzzbench/blob/64da23a158642b039751291e3422938dc333e6c6/service/experiment-config.yaml#L18

This merges new exp results with old ones. It's better to disable it for our case because we want the latest pure results without mutation. It's also necessary to disable it for later experiments with mutation.

DonggeLiu avatar Jan 15 '24 09:01 DonggeLiu

@Alan32Liu this version should now work without the --mutation-analysis flag. Could you do a standard coverage run with the competitors fuzzers?

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name sbft-standard-cov-01-15 --fuzzers libafl libfuzzer mystique pastis bandfuzz tunefuzz fox

Let's use libafl as a baseline instead of aflplusplus, libafl seems to work fine.

merge_with_nonprivate: true

Is now also set to false.

phi-go avatar Jan 15 '24 12:01 phi-go

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name sbft-standard-cov-01-16 --fuzzers libafl libfuzzer mystique pastis bandfuzz tunefuzz fox

DonggeLiu avatar Jan 15 '24 22:01 DonggeLiu

Experiment sbft-standard-cov-01-16 data and results will be available later at: The experiment data. The experiment report.

DonggeLiu avatar Jan 15 '24 23:01 DonggeLiu

Hi, I found that tunefuzz on jsoncpp_jsoncpp_fuzzer run failed in sbft standard cov. But it succeed on the last mutation analysis you run, which should be the same program and same setup. Given that I didn't change any setting for the fuzzer, I presume there might be something wrong in the last sbft standard cov trail.

Besides, if I understand correctly, the soft-standard-cov should contain 23 public benchmarks, while there are only 9 available.

Could you check it out a bit? Thanks!

kdsjZh avatar Jan 17 '24 09:01 kdsjZh

@kdsjZh I think the experiment is still running, from what I understand not all trials are started at once, there is a limit of parallel runs so it might take a bit until all results show up. Tunefuzz seemed to build jsoncpp without a problem build logs, so I wouldn't worry yet. @Alan32Liu, I hope you can confirm :slightly_smiling_face:

It also says in the report linked that the experiment is still in progress.

phi-go avatar Jan 17 '24 09:01 phi-go

Hi @kdsjZh, thanks for reporting this.

@phi-go

I think the experiment is still running, from what I understand not all trials are started at once, there is a limit of parallel runs so it might take a bit until all results show up.

Yep, by looking at the gcloud logs, I can confirm that the experiment is running.

Tunefuzz seemed to build jsoncpp without a problem build logs, so I wouldn't worry yet.

That's also true.

However, something did draw my attention:

  1. Tunefuzz error in doing trails: image

Here is an example error log entry:

{
  "insertId": "1m02pjzfgwjx29",
  "jsonPayload": {
    "traceback": "Traceback (most recent call last):\n  File \"/src/experiment/runner.py\", line 483, in experiment_main\n    runner.conduct_trial()\n  File \"/src/experiment/runner.py\", line 280, in conduct_trial\n    self.set_up_corpus_directories()\n  File \"/src/experiment/runner.py\", line 265, in set_up_corpus_directories\n    _unpack_clusterfuzz_seed_corpus(target_binary, input_corpus)\n  File \"/src/experiment/runner.py\", line 136, in _unpack_clusterfuzz_seed_corpus\n    seed_corpus_archive_path = get_clusterfuzz_seed_corpus_path(\n  File \"/src/experiment/runner.py\", line 102, in get_clusterfuzz_seed_corpus_path\n    fuzz_target_without_extension = os.path.splitext(fuzz_target_path)[0]\n  File \"/usr/local/lib/python3.10/posixpath.py\", line 118, in splitext\n    p = os.fspath(p)\nTypeError: expected str, bytes or os.PathLike object, not NoneType\n",
    "component": "runner",
    "trial_id": "2721051",
    "instance_name": "r-sbft-standard-cov-01-16-2721051",
    "fuzzer": "tunefuzz",
    "benchmark": "jsoncpp_jsoncpp_fuzzer",
    "message": "Error doing trial.",
    "experiment": "sbft-standard-cov-01-16"
  },
  "resource": {
    "type": "gce_instance",
    "labels": {
      "zone": "projects/1097086166031/zones/us-central1-c",
      "instance_id": "3117243333354390038",
      "project_id": "fuzzbench"
    }
  },
  "timestamp": "2024-01-16T17:33:02.824333536Z",
  "severity": "ERROR",
  "logName": "projects/fuzzbench/logs/fuzzbench",
  "receiveTimestamp": "2024-01-16T17:33:02.824333536Z"
}
  1. Many build failures of other fuzzers. Normally, I would expect libFuzzer to be able to build on all benchmarks, but: image

Example:

{
  "insertId": "1m9tb2iffruamu",
  "jsonPayload": {
    "message": "Failed to build benchmark: woff2_convert_woff2ttf_fuzzer, fuzzer: libfuzzer.",
    "experiment": "sbft-standard-cov-01-16",
    "traceback": "Traceback (most recent call last):\n  File \"/work/src/experiment/build/builder.py\", line 191, in build_fuzzer_benchmark\n    buildlib.build_fuzzer_benchmark(fuzzer, benchmark)\n  File \"/work/src/experiment/build/gcb_build.py\", line 140, in build_fuzzer_benchmark\n    _build(config, config_name)\n  File \"/work/src/experiment/build/gcb_build.py\", line 124, in _build\n    raise subprocess.CalledProcessError(result.retcode, command)\nsubprocess.CalledProcessError: Command '['gcloud', 'builds', 'submit', '/work/src', '--config=/tmp/tmpjbm4vf7x', '--timeout=14400s', '--worker-pool=projects/fuzzbench/locations/us-central1/workerPools/buildpool-e2-std-32']' returned non-zero exit status 1.\n",
    "instance_name": "d-sbft-standard-cov-01-16",
    "component": "dispatcher"
  },
  "resource": {
    "type": "gce_instance",
    "labels": {
      "project_id": "fuzzbench",
      "zone": "projects/1097086166031/zones/us-central1-c",
      "instance_id": "7879014797801950091"
    }
  },
  "timestamp": "2024-01-16T13:01:38.517591212Z",
  "severity": "ERROR",
  "logName": "projects/fuzzbench/logs/fuzzbench",
  "receiveTimestamp": "2024-01-16T13:01:38.517591212Z"
}

Same with libAFL: image

DonggeLiu avatar Jan 17 '24 10:01 DonggeLiu

@phi-go: Feel free to let me know if you'd like to look into these logs together.

DonggeLiu avatar Jan 17 '24 10:01 DonggeLiu

Yeah, it would be good to look into this. I'm not quite sure what could be the root cause at the moment.

phi-go avatar Jan 17 '24 10:01 phi-go

Yeah, it would be good to look into this. I'm not quite sure what could be the root cause at the moment.

For TuneFuzz, would this help debugging?

Traceback (most recent call last):\n  File \"/src/experiment/runner.py\", line 483, in experiment_main\n    runner.conduct_trial()\n  File \"/src/experiment/runner.py\", line 280, in conduct_trial\n    self.set_up_corpus_directories()\n  File \"/src/experiment/runner.py\", line 265, in set_up_corpus_directories\n    _unpack_clusterfuzz_seed_corpus(target_binary, input_corpus)\n  File \"/src/experiment/runner.py\", line 136, in _unpack_clusterfuzz_seed_corpus\n    seed_corpus_archive_path = get_clusterfuzz_seed_corpus_path(\n  File \"/src/experiment/runner.py\", line 102, in get_clusterfuzz_seed_corpus_path\n    fuzz_target_without_extension = os.path.splitext(fuzz_target_path)[0]\n  File \"/usr/local/lib/python3.10/posixpath.py\", line 118, in splitext\n    p = os.fspath(p)\nTypeError: expected str, bytes or os.PathLike object, not NoneType\n

DonggeLiu avatar Jan 17 '24 11:01 DonggeLiu

The clusterfuzz seed corpus is a source for the initial corpus, correct? While I added timestamps to the corpus collected during the trial I did nothing to the clusterfuzz part. I'll need to take a closer look at the code. Can we confirm that this only happens in this branch and not master?

phi-go avatar Jan 17 '24 11:01 phi-go

Can we confirm that this only happens in this branch and not master?

I think we can. Earlier I prepared these two experiments on master as a comparison: https://github.com/google/fuzzbench/pull/1945#issuecomment-1892842376

TuneFuzz had no such error in that experiment: image

BTW, libFuzzer had no build error either: image

There were some Coverage run failed., but they were fuzzer runtime errors like ERROR: libFuzzer: out-of-memory (used: 2065Mb; limit: 2048Mb), which is unrelated.

DonggeLiu avatar Jan 17 '24 11:01 DonggeLiu

So it seems that in this code:

    def set_up_corpus_directories(self):
        """Set up corpora for fuzzing. Set up the input corpus for use by the
        fuzzer and set up the output corpus for the first sync so the initial
        seeds can be measured."""
        fuzz_target_name = environment.get('FUZZ_TARGET')
        target_binary = fuzzer_utils.get_fuzz_target_binary(
            FUZZ_TARGET_DIR, fuzz_target_name)
        input_corpus = environment.get('SEED_CORPUS_DIR')
        os.makedirs(input_corpus, exist_ok=True)
        if not environment.get('CUSTOM_SEED_CORPUS_DIR'):
            _unpack_clusterfuzz_seed_corpus(target_binary, input_corpus)
        else:
            _copy_custom_seed_corpus(input_corpus)

The variable target_binary is set to None. Responsible function get_fuzz_target_binary is here: https://github.com/phi-go/fuzzbench/blob/72926c0bdf8614f16adaef2b4cd658e1908f6186/common/fuzzer_utils.py#L73.

In that function the target binary path is only returned if the file exists. So this is probably a build error not specifically a corpus thing. This is under the assumption that FUZZ_TARGET is set.

phi-go avatar Jan 17 '24 11:01 phi-go

I modified part of the Makefile generation to support the mutation testing docker builds, maybe I broke something there. @Alan32Liu could you take a look at the following changes to the files, I thought those should be fine so maybe another pair of eyes would be good:

https://github.com/google/fuzzbench/pull/1941/files#diff-9ba00a2744edb4b6e8a4768b520cd4b147e26ddec73c13337aac6a79ccfa99a0

  • docker/generate_makefile.py
  • docker/image_types.yaml
  • experiment/build/gcb_build.py

phi-go avatar Jan 17 '24 11:01 phi-go

Oh, I see now. On the cloud it seems the mutation analysis build process is used for the fuzzer builds, which is definitely wrong... Though, I don't yet understand why that happens.

https://www.googleapis.com/download/storage/v1/b/fuzzbench-data/o/sbft-standard-cov-01-16%2Fbuild-logs%2Fbenchmark-bloaty_fuzz_target-fuzzer-libfuzzer.txt?generation=1705424198204020&alt=media

phi-go avatar Jan 17 '24 12:01 phi-go

I modified part of the Makefile generation to support the mutation testing docker builds, maybe I broke something there. @Alan32Liu could you take a look at the following changes to the files, I thought those should be fine so maybe another pair of eyes would be good:

https://github.com/google/fuzzbench/pull/1941/files#diff-9ba00a2744edb4b6e8a4768b520cd4b147e26ddec73c13337aac6a79ccfa99a0

  • docker/generate_makefile.py
  • docker/image_types.yaml
  • experiment/build/gcb_build.py

I did not notice anything either: They replicate coverage, and nothing seems too strange.

However, I noticed that TuneFuzz works fine on some benchmarks (e.g., freetype2_ftfuzzer): image

In fact, ignoring the build errors, the error in doing trails only occur on benchmark jsoncpp_jsoncpp_fuzzer: image

Not too sure about the build errors, though. The log is not very useful:

{
  "insertId": "yys9kff991ita",
  "jsonPayload": {
    "component": "dispatcher",
    "traceback": "Traceback (most recent call last):\n  File \"/work/src/experiment/build/builder.py\", line 191, in build_fuzzer_benchmark\n    buildlib.build_fuzzer_benchmark(fuzzer, benchmark)\n  File \"/work/src/experiment/build/gcb_build.py\", line 140, in build_fuzzer_benchmark\n    _build(config, config_name)\n  File \"/work/src/experiment/build/gcb_build.py\", line 124, in _build\n    raise subprocess.CalledProcessError(result.retcode, command)\nsubprocess.CalledProcessError: Command '['gcloud', 'builds', 'submit', '/work/src', '--config=/tmp/tmprsixo8fm', '--timeout=14400s', '--worker-pool=projects/fuzzbench/locations/us-central1/workerPools/buildpool-e2-std-32']' returned non-zero exit status 1.\n",
    "message": "Failed to build benchmark: curl_curl_fuzzer_http, fuzzer: tunefuzz.",
    "experiment": "sbft-standard-cov-01-16",
    "instance_name": "d-sbft-standard-cov-01-16"
  },
  "resource": {
    "type": "gce_instance",
    "labels": {
      "project_id": "fuzzbench",
      "zone": "projects/1097086166031/zones/us-central1-c",
      "instance_id": "7879014797801950091"
    }
  },
  "timestamp": "2024-01-16T12:42:50.595247299Z",
  "severity": "ERROR",
  "logName": "projects/fuzzbench/logs/fuzzbench",
  "receiveTimestamp": "2024-01-16T12:42:50.595247299Z"
}

I suppose one thing we can do is to add extensive logs related to these build errors (which seem to have more victims and deserve a higher priority). Then we run a simple test experiment and debug with the logs.

DonggeLiu avatar Jan 17 '24 12:01 DonggeLiu

Luckily the build-logs do show something, see my other comment. Though, they are not available on a local build, I'll try to patch that in, maybe I messed something up in the dockerfile dependencies. I can test that locally for now.

Also thank you for taking a look.

phi-go avatar Jan 17 '24 12:01 phi-go

Ok, I can confirm that mutation testing is not used locally to build bloaty_fuzz_target-libfuzzer and more imporantly it builds without a problem. So it should be something in the gcb specific code, which I do not really understand that well. Also the build-logs are truncated so we do not see the remaining info: https://www.googleapis.com/download/storage/v1/b/fuzzbench-data/o/sbft-standard-cov-01-16%2Fbuild-logs%2Fbenchmark-bloaty_fuzz_target-fuzzer-libfuzzer.txt?generation=1705424198204020&alt=media.

However, before I dig deeper we can still complete the evaluation without fixing this. For now we planned to do the mutation analysis part locally on our blades, we do not support that many benchmarks so this is fine. The coverage for all benchmarks we could do an a branch without our changes and only the fuzzer PRs.

phi-go avatar Jan 17 '24 14:01 phi-go

I suppose one thing we can do is to add extensive logs related to these build errors (which seem to have more victims and deserve a higher priority). Then we run a simple test experiment and debug with the logs.

The missing information seems to just be truncated from the build-log, I changed the code a bit to allow storing everything for the gcb_build execute call, I hope that should reveal the missing info. Let's try the simple test experiment, if you feel comfortable with that.

phi-go avatar Jan 17 '24 17:01 phi-go

How about:

--fuzzers libafl libfuzzer tunefuzz pastis
--benchmarks freetype2_ftfuzzer jsoncpp_jsoncpp_fuzzer bloaty_fuzz_target lcms_cms_transform_fuzzer

Because it includes good success/failure comparisons on both fuzzers and benchmarks.

benchmark \ fuzzer libafl libfuzzer tunefuzz pastis
freetype2_ftfuzzer Can Run Can Run Can Run Can Run but NaN
jsoncpp_jsoncpp_fuzzer Can Run Can Run Error doing trails Error doing trails
bloaty_fuzz_target Cannot Build Cannot Build Cannot Build Cannot Build
lcms_cms_transform_fuzzer Can Run Can Run Can Run Can Run

Let me know if you'd like to add more.

DonggeLiu avatar Jan 17 '24 22:01 DonggeLiu

Thank you for looking into this so thoroughly. This sounds like a plan. If you want to reduce compute more, even one hour runs and a few trials should give us enough to debug, though, I don't know how to do this with flags.

/gcbrun run_experiment.py -a --mutation-analysis --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name sbft-dev-01-18 --fuzzers libafl libfuzzer tunefuzz pastis --benchmarks freetype2_ftfuzzer jsoncpp_jsoncpp_fuzzer bloaty_fuzz_target lcms_cms_transform_fuzzer

phi-go avatar Jan 18 '24 13:01 phi-go