fuzzbench icon indicating copy to clipboard operation
fuzzbench copied to clipboard

Experiment Announcement Thread

Open jonathanmetzman opened this issue 5 years ago • 37 comments

New experiments will be announced on this issue as discussed on #205. I think this is a good place to discuss experiments as well.

jonathanmetzman avatar Apr 22 '20 04:04 jonathanmetzman

https://www.fuzzbench.com/reports/2020-04-20-aflplusplus/index.html contains afl++ variants. The experiment is currently running, most of the trials have completed, though some started running earlier today, and coverage measurement still needs to finish.

I just started another experiment https://www.fuzzbench.com/reports/2020-04-21/index.html that includes every fuzzer except for the afl++ variants benchmarked above.

Once both of these experiments complete I will make a combined report.

Note that coverage measuring for the previous experiment https://www.fuzzbench.com/reports/2020-04-14/index.html didn't complete because of a bug I have fixed.

jonathanmetzman avatar Apr 22 '20 05:04 jonathanmetzman

@jonathanmetzman when do you expect the 2020-04-20-aflplusplus one to be finished with the measurement? and is the title then updated to not mention "incomplete"?

no need to combine this with 2020-04-21 btw, this will just make the overall report graphs unreadable with so many entries :)

vanhauser-thc avatar Apr 22 '20 08:04 vanhauser-thc

@jonathanmetzman when do you expect the 2020-04-20-aflplusplus one to be finished with the measurement? and is the title then updated to not mention "incomplete"?

Title gets updated automatically. Hard to say when it will complete, I expect a day or two from now. The recent fixes we made and removal of irssi should reduce measurement time considerably. Unfortuantely, afl++ experiment started before that.

no need to combine this with 2020-04-21 btw, this will just make the overall report graphs unreadable with so many entries :)

OK sounds good to me. Let me know if you want just the AFL++ fuzzers then (I can do that as well).

jonathanmetzman avatar Apr 22 '20 17:04 jonathanmetzman

OK sounds good to me. Let me know if you want just the AFL++ fuzzers then (I can do that as well).

that would be good however the new run does not have irssi target in there ... does that work and not f*ck up the benchmark calculation?

vanhauser-thc avatar Apr 22 '20 18:04 vanhauser-thc

I can exclude that benchmark.

jonathanmetzman avatar Apr 22 '20 18:04 jonathanmetzman

https://www.fuzzbench.com/reports/2020-04-20-aflplusplus/index.html contains afl++ variants. The experiment is currently running, most of the trials have completed, though some started running earlier today, and coverage measurement still needs to finish.

I just started another experiment https://www.fuzzbench.com/reports/2020-04-21/index.html that includes every fuzzer except for the afl++ variants benchmarked above.

Once both of these experiments complete I will make a combined report.

Both of these experiments have finished measuring. I think we are going to prioritize speeding up measuring.

jonathanmetzman avatar Apr 28 '20 18:04 jonathanmetzman

OK sounds good to me. Let me know if you want just the AFL++ fuzzers then (I can do that as well).

that would be good however the new run does not have irssi target in there ... does that work and not f*ck up the benchmark calculation?

Made this combination report: https://www.fuzzbench.com/reports/2020-04-21-and-20-aflplusplus/index.html It contains all benchmarks (except woff and irssi) and all afl++ based fuzzers from 2020-04-21 and 2020-04-20-aflplusplus

jonathanmetzman avatar Apr 28 '20 19:04 jonathanmetzman

@jonathanmetzman thanks! whenever you can start the next batch - that one will be very interesting, especially those with increased map sizes :)

vanhauser-thc avatar Apr 28 '20 19:04 vanhauser-thc

@jonathanmetzman thanks! whenever you can start the next batch - that one will be very interesting, especially those with increased map sizes :)

Not exactly sure when this will be but probably by Friday, I'm trying to work on some long term improvement and planning this week.

jonathanmetzman avatar Apr 28 '20 19:04 jonathanmetzman

I started experiments for AFL++ and fastcgs with and without huge page tables.

jonathanmetzman avatar May 02 '20 02:05 jonathanmetzman

I started experiments for AFL++ and fastcgs with and without huge page tables.

Experiments are done. https://www.fuzzbench.com/reports/2020-05-01-fastcgs/index.html compared versions of fastcgs with support for huge page tables and without. CC @alifahmed

https://www.fuzzbench.com/reports/2020-05-01-aflplusplus-1/index.html and https://www.fuzzbench.com/reports/2020-05-01-aflplusplus-2/index.html compared afl++ variants.

jonathanmetzman avatar May 04 '20 20:05 jonathanmetzman

I'm starting to work on using resources more intelligently in experiments (particularly for fuzzer variants or features that are in development). The next experiments I will test using preemptible instances for trials. The differences you will see here are:

  1. Experiments will be 23 hours (since preemptible instances cannot last longer than 24 hours and we need time to start up).
  2. Some (generally, 5-15% according to the docs) trials will not complete. Assuming that 3 trials do not complete for a fuzzer-benchmark, it only means comparisons between it and other fuzzers will be slightly less significant. We may be able to fix this in the future by restarting preempted trials, but for now we won't.

jonathanmetzman avatar May 05 '20 00:05 jonathanmetzman

Running a new experiment with the main fuzzers and the new benchmark. The experiment is using preemptible VMs https://www.fuzzbench.com/reports/2020-05-04-preempt-new-bench/index.html

EDIT:

I'll discuss the results from that experiment here to reduce spam on this thread.

jonathanmetzman avatar May 05 '20 02:05 jonathanmetzman

https://www.fuzzbench.com/reports/2020-05-11/index.html

inferno-chromium avatar May 13 '20 04:05 inferno-chromium

A 15 trial full experiment: https://www.fuzzbench.com/reports/2020-05-24/index.html

jonathanmetzman avatar May 26 '20 23:05 jonathanmetzman

20 trial experiment comparing aflplusplus_optimal, aflplusplus_shmem, and ankou (with a buggy integration): https://www.fuzzbench.com/reports/2020-05-28/index.html

jonathanmetzman avatar May 31 '20 20:05 jonathanmetzman

https://www.fuzzbench.com/reports/2020-06-12/index.html is an experiment with libfuzzer_nocmp, aflcc, manul and afl++ (and its variants) combined with 2020-05-24.

jonathanmetzman avatar Jun 16 '20 05:06 jonathanmetzman

The results of the new experiment https://www.fuzzbench.com/reports/2020-07-17/index.html currently look very different to previous runs. What has changed? is it using the new coverage? or are the benchmark targets changed?

was there already an assessment which of the two coverage methods is better or what the advantages/disadvantages are? thanks!

vanhauser-thc avatar Jul 16 '20 20:07 vanhauser-thc

What did you notice that's different in 2020-07-17? I stopped that experiment to run 2020-07-13 (the AFL++ experiment I was supposed to run for you a few days ago, but didn't accidentally because of a bug in the service code, 2020-07-13 running right now).

We're only part way in 2020-07-17 but I ran it to make sure that the results are the same as usual. 2020-07-17 was run using https://github.com/google/fuzzbench/pull/509 which will allow non-FuzzBench maintainers to easily contribute benchmarks from OSS-Fuzz. In theory the results should be the same even though the builds are different. But any differences are very helpful for me.

jonathanmetzman avatar Jul 16 '20 21:07 jonathanmetzman

I moved 2020-07-17 here since I don't consider it an official experiment.

was there already an assessment which of the two coverage methods is better or what the advantages/disadvantages are? thanks!

Note that 2020-07-17 used sancov, the current cov implementation. Only clang-cov-test used it. So far it looks totally fine to replace sancov with clang cov. But we're still investigating.

jonathanmetzman avatar Jul 16 '20 21:07 jonathanmetzman

What did you notice that's different in 2020-07-17?

honggfuzz is not in the top list and instead fastcgs_lm is ... both unusual. Also aflplusplus_ctx_nozerosingle has no reason to be at place 3 and should rather be around ankou. sure it is only running for 1/3 of the time, but usually there are not dramatic improvements or degration over the whole benchmark after 6 hours.

vanhauser-thc avatar Jul 16 '20 22:07 vanhauser-thc

Also 2020-07-17 is not updated anymore since moved

vanhauser-thc avatar Jul 17 '20 07:07 vanhauser-thc

Also 2020-07-17 is not updated anymore since moved

I stopped that experiment to run yours.

jonathanmetzman avatar Jul 17 '20 19:07 jonathanmetzman

There is a very noticeable bump in edge coverage between all the reports before 07/25 and after 08/03.

https://www.fuzzbench.com/reports/2020-07-25/index.html https://www.fuzzbench.com/reports/2020-08-03/index.html

I am wondering what caused it?

alifahmed avatar Aug 15 '20 19:08 alifahmed

true! and systemd has much less now which is weird

vanhauser-thc avatar Aug 15 '20 19:08 vanhauser-thc

There is a very noticeable bump in edge coverage between all the reports before 07/25 and after 08/03.

https://www.fuzzbench.com/reports/2020-07-25/index.html https://www.fuzzbench.com/reports/2020-08-03/index.html

I am wondering what caused it?

We moved to using clang code coverage instead of sancov.

inferno-chromium avatar Aug 15 '20 21:08 inferno-chromium

what is measured now? edges? basic blocks? lines of codes? instructions?

vanhauser-thc avatar Aug 15 '20 21:08 vanhauser-thc

what is measured now? edges? basic blocks? lines of codes? instructions?

It is called regions in clang code coverage. https://llvm.org/docs/CoverageMappingFormat.html#id14. It is character level precision, e.g. multiple ones in just one line “return x || y && z”.

inferno-chromium avatar Aug 15 '20 21:08 inferno-chromium

Thanks! I think the report graphs should indicate this change to avoid confusion. Currently the graphs still says "Reached edge coverage". Should it not be something like region/code coverage?

alifahmed avatar Aug 15 '20 22:08 alifahmed

Thanks! I think the report graphs should indicate this change to avoid confusion. Currently the graphs still says "Reached edge coverage". Should it not be something like region/code coverage?

I think we discussed it a little internally, i think some people felt that just region can be confusing and community is more accustomed to edges for coverage. @lszekeres - can you revisit and fix this confusion sometime next week ?

inferno-chromium avatar Aug 15 '20 23:08 inferno-chromium