winafl speed comparison (FAQ)

Hi,

I think an FAQ entry would be very helpful that gives an indicator which mode (Dynamorio, Intel PT, syzygy) is faster than the other and approx how much (if all three approaches are possible). I think syzygy has a 2-4x speed increase to Dynamorio - but it IntelPT in between the two? or slower than Dynamorio?

Jan 30 '20 10:01 vanhauser-thc

Hey, yeah that might be helpful, but I don't actually have real stats on this at the time (just random observations, see below). If someone volunteers to do actual measurements on real-world targets and do some stats, I would be happy to incorporate it in the FAQ. The other thing to note is that there are considerations other than speed. As it stands now:

Syzygy (static instrumentation) is the fastest but it requires the source code of the target (to be able to build it with all necessary flags) and only works on 32-bit
Dynamorio (dynamic instrumentation) is usually second fastest and still the best all-around option assuming you don't have the source code (which is going to be the case for most Windows software) and that Dynamorio works well with your target.
IntelPT (hardware tracing) - speed is heavily dependent on the trace buffer size, but in most cases it's going to be somewhat slower than Dynamorio. The main benefit of IntelPT is that it doesn't require any modification to the target itself so it might work in cases where other options fail. Requires a new-ish Intel CPU and that the target doesn't use self-modifying code.

Jan 31 '20 09:01 ifratric

If Syzygy requires source code - why not move to afl(++)'s llvm_mode (with the winafl's way of map sharing etc.)? That should be way faster and better than what Syzygy does/have (e.g. laf-intel, whitelisting, instrim instrumentation, no problem with basic blocks of any size, map collision reduction, etc.). And llvm/clang is nicely supported with Visual Studio too (even with asan + libfuzzer), so that should be a safe solution (support and future proof wise).

I am a bit surprised that dynamorio is faster than intelpt, but the reason for that is that win dynamorio you basically have a persistent mode where for intelpt you run the whole binary again and again? As slow as intelpt decoding is, it should be faster than dynamorio, so combining both could be faster I guess (if it would be possible).

For the speed stuff I was asking for a customer who asked me and I said I will see what I can find out :) ... I am using Linux natively and hardly do any Windows fuzzing anymore, otherwise I would do measurements and send a PR.

Jan 31 '20 13:01 vanhauser-thc

Note that IntelPT mode does include persistence. As far as I can tell, the overhead mostly comes from writing data into the trace buffer. Also, as mentioned previously, using a bigger trace buffer results in more overhead, which I assume is due to impact on caching. There is also overhead from decoding the trace, though that has been vastly improved with the new custom decoder.

Jan 31 '20 14:01 ifratric

winafl winafl copied to clipboard

speed comparison (FAQ)

winafl
winafl copied to clipboard