go-fuzz
go-fuzz copied to clipboard
Better minimization
Currently go-fuzz has 3 minimization strategies:
- strip input suffix (O(N))
- strip all possible subranges (O(N^2))
- replace all individual bytes with '0' (O(N))
Here was suggested another strategy: https://github.com/gogo/protobuf/issues/91#issuecomment-140676334 Namely:
- extract all possible subranges (this is also O(N^2) so should be no worse than range removal)
Also what I hit several times when inputs are programs:
- remove pairing brackets
sweet :)
http://resources.sei.cmu.edu/library/asset-view.cfm?assetid=28043
Not sure if that algorithm is applicable since we don't always have a pristine seed file.
It appears the standard algorithm that applies here is https://www.st.cs.uni-saarland.de/papers/tse2002/tse2002.pdf
My favourite part: Simplification is not equivalent to isolation. Isolation shows the smallest difference between a failing and a passing case. Simplification shows the smallest failing case.
I implemented the ddmin minimization algorithm: https://github.com/dgryski/go-ddmin ; would be interesting to compare the results of this to what go-fuzz's techniques reduce it to.
@dgryski I am pretty sure we can improve the current minimization algorithm, so feel free to send a pull request if/when you do the comparison. What is time complexity of the algorithm? Current O(N^2) is somewhat unpleasant. Current tail stripping is quite efficient for real inputs, but if ddmin has good time complexity, then probably we can just do the generic thing. Also, there can be some useful minimization techniques that mutate the data itself (like current replacement with 0). For example, there can be an identifier in the input, but you need to alter (shorten) all occurrences of it in the input. Or, there can be a length-value pair, if you shorten the value, you also need to change length accordingly.
ddmin is also O(n^2) time complexity. It also doesn't have any heuristics; it's just an systematic brute force approach to removing pieces of input. I would use it instead of the current first two steps.
I'm not sure when I'll have time to look at this though.
Another ddmin implementation, seemingly written around the same time: https://godoc.org/github.com/dominikh/go-quickcheck
The ddmin algorithm there is mine, just generalized away from []byte.
Current minimization code is https://github.com/dvyukov/go-fuzz/blob/34dea346b8c14573ea065f2a3654a7d85e1a8b0c/go-fuzz/worker.go#L322