fuzzbench icon indicating copy to clipboard operation
fuzzbench copied to clipboard

Export detailed coverage data (functions & segments) as State files (json)

Open BharathMonash opened this issue 4 years ago • 0 comments

This PR is a follows #987 as discussed. This patch is about exporting detailed coverage data as state files in each cycle for every |benchmark, fuzzer, trial| combination.

DESCRIPTION

So the idea is to persist the segment and function coverage data. For segments, we try to overwrite the same JSON state file which also serves as a blacklist of segments already discovered in the next cycle. For functions, since we need to record functions observed in each cycle regardless of the fact that it was previously discovered or not, so, we simply export the function coverage data as a state file in each cycle (seperate file for each cycle, not overwritten).

DATA POINTS RECORDED

  • For segment coverage - Benchmark name, Fuzzer name, Trial, Timestamp, SourceFile, segment Line & segment Column

  • For Function coverage - Benchmark name, Fuzzer name, Trial, Timestamp, Function names and Hits

JSON STRUCTURE OF THE DATA BEING STORED

The coverage data being stored as JSON (state file) follows the structure below (pandas DataFrame exported as JSON in table orientation):

{
    "schema": {
        "fields": [
            {
                "name": "index",
                "type": "string"
            },
            {
                "name": "col 1",
                "type": "numeric"
            },
            {
                "name": "col 2",
                "type": "string"
            }
        ],
        "primaryKey": [
            "index"
        ],
        "pandas_version": "0.20.0"
    },
    "data": [
        {
            "index": "row 1",
            "col 1": "a",
            "col 2": "b"
        },
        {
            "index": "row 2",
            "col 1": "c",
            "col 2": "d"
        }
    ]
} 

WORKFLOW

The basic workflow of this PR is mentioned below:

  1. load the previous state file for segment coverage
  2. extract segments and function coverage data for the current cycle
  3. add the newly discovered segments to the previous state
  4. set current state for functions (new file) and segments (overwrite previous file)

CHANGES TO EXISTING CODE

A simple parameter cycle_dependent has been added to the get_previous() and set_current() methods to determine wether to save the state as a new state file or to overwrite an existing state file.

It would be a bit hard to give the size estimate of the generated files at the moment but maybe I would be able to get back to you on this once the GCS API is exposed. Please feel free to suggest any possible improvements and other json formats if that makes things easier :)

BharathMonash avatar Apr 05 '21 16:04 BharathMonash