go-fuzz reconsider priority assignment for valid inputs

Currently valid inputs (with result=1) receive 2x priority boost. This is good. However, if we have 2 valid inputs and 200 invalid ones (that examine various error paths), we actually want to mutate valid inputs 2x more frequently than all invalid ones. Valid inputs has much higher probability of uncovering new source code statements, while mutated invalid inputs will most likely bail out through the same error path.

Jul 28 '15 18:07 dvyukov

From description it looks like easy one to fix. Simplest solution is to have two (maybe more in future) separate sets of inputs (invalid, valid, ...). Choosing input to mutate will need two random values (set, item from set). Special care needed if some sets are empty (eg just after start).

It seems ok to you?

Mar 23 '16 21:03 Komosa

Hi @Komosa,

Thanks for looking into this. Go-fuzz already has a scoring system and chooses inputs according to score (see updateScores function in hub.go). So I think we just need to update the scores differently. Splitting inputs into separate sets does not look scalable to me. We have several orthogonal properties of inputs, some of them are gradual (e.g. input size or execution time), some are more important that others. I don't see how it can work based on input sets.

Mar 24 '16 07:03 dvyukov

Thank you for response.

It is ok to have constant ratio of valid (and, as consequence, invalid) inputs choosen for mutation? If so what should be value of this ratio? 2/3 for valid ones is obvious choice, but I think that maybe greater one will help to focus more on interesting inputs.

Apr 01 '16 21:04 Komosa

@Komosa This is hard questions. The only more-or-less objective way to tune such things is to measure impact on real fuzzing targets (how does coverage progress? at what coverage level it saturates after an hour? how many bugs does it discover?). This is time consuming.

Please measure (1) number of correct/incorrect inputs and (2) total priority of correct/incorrect inputs before/after the change on several fuzzers from examples dir. This should give us at least some sense of the impact of the change.

Thinking of this more, maybe we should not be such radical and run valid inputs twice as frequently as invalid ones. Consider that we uncovered some large, previously unexamined piece of code with a new input, but in the end this input turns to be invalid (e.g. beginning of input is correct, but the ending is incorrect). This is still a very interesting input that we should mutate frequently.

Let's look at the numbers and then decide.

Apr 02 '16 07:04 dvyukov

I will run such tests. Is Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz PC modern enough?

I guess that not only coverage after hour is interesting, but also interpolation of coverage(time) function. Of course I plan to automate such tests.

One more question: Default value of 8 (=NUMCPUS on my machine) processes seems ok for me, but in various examples there are higher numbers (as 10 or even 500, thus without specifying what env is being used). Should I pick 8 or more?

Apr 10 '16 10:04 Komosa

Particular machine/env does not matter as long as you compare before/after the change in the same conditions.

Of course I plan to automate such tests.

It would be great if you can produce a reusable solution. There are lots of tuning knobs (e.g. as simple as how frequently invoke versifier or sonar). Current defaults are chosen with a fair dice roll. If there would be a simple way to measure, we can choose better values for all these knobs and increase efficiency.

Apr 10 '16 10:04 dvyukov

Just to say I didn't forgot about this thread.

One-pass run for hour seems to be too sensitive to (initial) lucky mutation (at least for examples/lzw). I need more time to think about and experiment with it. Probably there is need to execute each test multiple times.

Is corpus field just number of different testing cases/paths (in other words, is linear grow lineary better)?

Apr 25 '16 20:04 Komosa

Yes, corpus is number of different, interesting inputs. But each new input is exponentially harder to obtain, so linear grow is exponentially better. I think, that cover field is a better characteristic, it show total code coverage for the whole corpus. Let's say in one run we cover 1000 basic blocks with 100 inputs. If in another run we cover the same 1000 basic blocks with just 50 inputs, it is equally good.

Apr 26 '16 08:04 dvyukov

go-fuzz go-fuzz copied to clipboard

reconsider priority assignment for valid inputs

go-fuzz
go-fuzz copied to clipboard