go-fuzz
                                
                                 go-fuzz copied to clipboard
                                
                                    go-fuzz copied to clipboard
                            
                            
                            
                        reconsider priority assignment for valid inputs
Currently valid inputs (with result=1) receive 2x priority boost. This is good. However, if we have 2 valid inputs and 200 invalid ones (that examine various error paths), we actually want to mutate valid inputs 2x more frequently than all invalid ones. Valid inputs has much higher probability of uncovering new source code statements, while mutated invalid inputs will most likely bail out through the same error path.
From description it looks like easy one to fix. Simplest solution is to have two (maybe more in future) separate sets of inputs (invalid, valid, ...). Choosing input to mutate will need two random values (set, item from set). Special care needed if some sets are empty (eg just after start).
It seems ok to you?
Hi @Komosa,
Thanks for looking into this. Go-fuzz already has a scoring system and chooses inputs according to score (see updateScores function in hub.go). So I think we just need to update the scores differently. Splitting inputs into separate sets does not look scalable to me. We have several orthogonal properties of inputs, some of them are gradual (e.g. input size or execution time), some are more important that others. I don't see how it can work based on input sets.
Thank you for response.
It is ok to have constant ratio of valid (and, as consequence, invalid) inputs choosen for mutation? If so what should be value of this ratio? 2/3 for valid ones is obvious choice, but I think that maybe greater one will help to focus more on interesting inputs.
@Komosa This is hard questions. The only more-or-less objective way to tune such things is to measure impact on real fuzzing targets (how does coverage progress? at what coverage level it saturates after an hour? how many bugs does it discover?). This is time consuming.
Please measure (1) number of correct/incorrect inputs and (2) total priority of correct/incorrect inputs before/after the change on several fuzzers from examples dir. This should give us at least some sense of the impact of the change.
Thinking of this more, maybe we should not be such radical and run valid inputs twice as frequently as invalid ones. Consider that we uncovered some large, previously unexamined piece of code with a new input, but in the end this input turns to be invalid (e.g. beginning of input is correct, but the ending is incorrect). This is still a very interesting input that we should mutate frequently.
Let's look at the numbers and then decide.
I will run such tests.
Is Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz PC modern enough?
I guess that not only coverage after hour is interesting, but also interpolation of coverage(time) function. Of course I plan to automate such tests.
One more question: Default value of 8 (=NUMCPUS on my machine) processes seems ok for me, but in various examples there are higher numbers (as 10 or even 500, thus without specifying what env is being used). Should I pick 8 or more?
Particular machine/env does not matter as long as you compare before/after the change in the same conditions.
Of course I plan to automate such tests.
It would be great if you can produce a reusable solution. There are lots of tuning knobs (e.g. as simple as how frequently invoke versifier or sonar). Current defaults are chosen with a fair dice roll. If there would be a simple way to measure, we can choose better values for all these knobs and increase efficiency.
Just to say I didn't forgot about this thread.
One-pass run for hour seems to be too sensitive to (initial) lucky mutation (at least for examples/lzw). I need more time to think about and experiment with it. Probably there is need to execute each test multiple times.
Is corpus field just number of different testing cases/paths (in other words, is linear grow lineary better)?
Yes, corpus is number of different, interesting inputs. But each new input is exponentially harder to obtain, so linear grow is exponentially better. I think, that cover field is a better characteristic, it show total code coverage for the whole corpus. Let's say in one run we cover 1000 basic blocks with 100 inputs. If in another run we cover the same 1000 basic blocks with just 50 inputs, it is equally good.