fuzz-introspector
fuzz-introspector copied to clipboard
Discrepancy between reachability and coverage for Python projects
https://oss-fuzz-introspector.storage.googleapis.com/index.html recently got a facelift, and this made some issues obvious. One is that code coverage is often a lot higher than reachability for Python projects. We should investigate why this is the reason and come up with a solution.
An example: glom has reachability of 13.6% but code coverage of 73.0%
One of the main reasons for this discrepancy is that the reachability analysis is much more focusesd in comparison to the code coverage analysis. For example, the following fuzzer:
# a lot of code will have runtime coverage in mod1, mod2, mod3 due to imports
import mod1
import mod2
import mod3
import atheris
# zero reachability
def TestOneInput(data):
return
# Code to trigger atheris
def main():
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()
if __name__ == "__main__":
main()
Will have a lot of code coverage in the modules mod1
, mod2
and mod3
since the coverage collection will happen before the modules are imported, and the import statements will cause a lot of code execution in the modules (such as registrering each def
in the module) where this is not considered in the reachability analysis, as the reachability only considers that code within the TestOneInput
, which in this case will be 0. In this sense, a lot of "code coverage" is in a sense also false code coverage from a fuzzing perspective -- but a lot of the code that is being considered in the code coverage report is not necessarily relevant from a "fuzzing code coverage" perspective.
A couple of things that can be done:
- enable code coverage to be started within the fuzzer entrypoint. The negative side of this is that we will not achieve full code coverage unless
imports
are specified there etc. - make some type of middle-ground where it's clear what code coverage is achieved by the fuzzer versus the e.g. code that happens before the fuzzer executes.
- Include reachability analysis based on the full fuzzer file, versus only the fuzzer entrypoint.