fuzzilli
fuzzilli copied to clipboard
Investigate high fuzzer overhead
Since commit 1408aab353b3a7f54b5a4e1b4471e054d615adcf, Fuzzilli computes and displays the "fuzzer overhead", i.e. the fraction of time that is not spent executing JavaScript code in the target engine. Normal values seem to be roughly between 5% and 15%. However, in long fuzzing sessions and seemingly especially in multithreaded mode (e.g. --jobs=32), this number can become quite significant (approaching 50%). This should be investigated.
I also get high overhead after a day or so when using --jobs
. Is there anything I can do to help debug this?
Fuzzer Overhead: 76.39%
You'd probably need to use some kind of profiler (e.g. perf on Linux) to figure out where the CPU time is spent, and if there's a bug there that we can fix.
Alternatively, you can use network synchronization (and maybe a low --jobs
number on each node), which seems to not suffer from the problem that much (--jobs
is still marked as "experimental" due to this issue).
I did some initial investigation a couple weeks ago, using perf to trace a long running session.
The largest win was switching to vfork
from fork
in libreprl. I'm not 100% on my understanding of why this was happening, but the kernel seemed to be taking more and more time to fork as memory usage increased when jobs
was high (64 in my case). This seemed to reduce the high overhead for long running sessions with high job counts.
Fuzzilli was also spending a significant amount of compute in the JavascriptLifter, in inlining and determining which variables should be let
vs const
. I'm not sure if removing these reduce the overall effectiveness of Fuzzilli, however.
Oh wow, great find! Yeah, using vfork
does indeed seem to cause a considerable improvement in performance, and I guess it should be fine to use since the child process doesn't modify any global memory before calling execve
(afaik, the only difference between the two on Linux is that page tables aren't modified/duplicated). My initial guess as to why this gives such a huge boost would be that the kernel has to take some lock related to page tables when performing a fork
, which then probably causes many of the other fuzzing threads to block on it. Then once a fair number of JIT related samples are in the corpus, the number of timouts (e.g. due to infinite loops), and subsequently the number of child process restarts becomes large enough for this to be an issue. But I'm just guessing here.
I'll put together a PR to switch to vfork
on Linux. I think we can keep using fork
everywhere else though for now, since it's probably not too important and e.g. the macOS man page for vfork
states
ERRORS
The vfork() system call will fail for any of the reasons described in the fork man page. In addition, it will fail if:
[EINVAL] A system call other than _exit() or execve() (or libc functions that make no system calls other than those) is called following calling a vfork() call.