go-fuzz icon indicating copy to clipboard operation
go-fuzz copied to clipboard

Better minimization

Open dvyukov opened this issue 10 years ago • 10 comments

Currently go-fuzz has 3 minimization strategies:

  • strip input suffix (O(N))
  • strip all possible subranges (O(N^2))
  • replace all individual bytes with '0' (O(N))

Here was suggested another strategy: https://github.com/gogo/protobuf/issues/91#issuecomment-140676334 Namely:

  • extract all possible subranges (this is also O(N^2) so should be no worse than range removal)

Also what I hit several times when inputs are programs:

  • remove pairing brackets

dvyukov avatar Sep 16 '15 09:09 dvyukov

sweet :)

awalterschulze avatar Sep 16 '15 09:09 awalterschulze

http://resources.sei.cmu.edu/library/asset-view.cfm?assetid=28043

Not sure if that algorithm is applicable since we don't always have a pristine seed file.

dgryski avatar Sep 16 '15 11:09 dgryski

It appears the standard algorithm that applies here is https://www.st.cs.uni-saarland.de/papers/tse2002/tse2002.pdf

dgryski avatar Oct 21 '15 00:10 dgryski

My favourite part: Simplification is not equivalent to isolation. Isolation shows the smallest difference between a failing and a passing case. Simplification shows the smallest failing case.

awalterschulze avatar Oct 22 '15 18:10 awalterschulze

I implemented the ddmin minimization algorithm: https://github.com/dgryski/go-ddmin ; would be interesting to compare the results of this to what go-fuzz's techniques reduce it to.

dgryski avatar Jan 06 '16 22:01 dgryski

@dgryski I am pretty sure we can improve the current minimization algorithm, so feel free to send a pull request if/when you do the comparison. What is time complexity of the algorithm? Current O(N^2) is somewhat unpleasant. Current tail stripping is quite efficient for real inputs, but if ddmin has good time complexity, then probably we can just do the generic thing. Also, there can be some useful minimization techniques that mutate the data itself (like current replacement with 0). For example, there can be an identifier in the input, but you need to alter (shorten) all occurrences of it in the input. Or, there can be a length-value pair, if you shorten the value, you also need to change length accordingly.

dvyukov avatar Jan 07 '16 09:01 dvyukov

ddmin is also O(n^2) time complexity. It also doesn't have any heuristics; it's just an systematic brute force approach to removing pieces of input. I would use it instead of the current first two steps.

I'm not sure when I'll have time to look at this though.

dgryski avatar Jan 07 '16 09:01 dgryski

Another ddmin implementation, seemingly written around the same time: https://godoc.org/github.com/dominikh/go-quickcheck

tv42 avatar Aug 17 '16 00:08 tv42

The ddmin algorithm there is mine, just generalized away from []byte.

dgryski avatar Aug 17 '16 04:08 dgryski

Current minimization code is https://github.com/dvyukov/go-fuzz/blob/34dea346b8c14573ea065f2a3654a7d85e1a8b0c/go-fuzz/worker.go#L322

dgryski avatar Dec 26 '17 19:12 dgryski