clusterfuzz icon indicating copy to clipboard operation
clusterfuzz copied to clipboard

Integration of new test case reducers?

Open renatahodovan opened this issue 6 years ago • 3 comments

I've seen that the project uses separate minimizers for different input formats (like JS and HTML) and also supports a syntax-unaware reduction approach, too. My question is whether the community is interested in the integration of a general solution for these tasks which can work on any kind of formats?

In the last few years, we've developed and used two Python{2,3}-based tools for this purpose:

  1. picire is a syntax-unaware parallel minimizer that can work at line or character level https://github.com/renatahodovan/picire
  2. picireny is a syntax-aware parallel minimizer (built upon the previous tool) that can reduce any input formats that have ANTLRv4 grammars. https://github.com/renatahodovan/picireny Available grammars: https://github.com/antlr/grammars-v4/

I'm happy to work/collaborate on their integration if there is any interest in them.

renatahodovan avatar Mar 29 '19 18:03 renatahodovan

@mbarbella-chromium wrote the minimizer code, what do you think? @renatahodovan - Marty's minimizer also uses delta debugging in a parallelized fashion, so what are the other advantages you see ? More formats ? Will this be run as a blackbox or is there is way to get progress in callbacks ?

inferno-chromium avatar Mar 29 '19 20:03 inferno-chromium

Seems worth investigating, at least. We just use very simple tokenizers for specific formats, so something smarter could be a pretty big benefit.

I'll take a closer look some time next week to try to get an idea of what it would take to integrate. One thing which could cause trouble is that our minimizer expects to be able to stop a task and resume it on another bot later, and there might be other assumptions that make it difficult.

mbarbella-chromium avatar Mar 29 '19 20:03 mbarbella-chromium

The first tool, picire, is similar to your solution since it's a parallel implementation of Delta Debugging.

However, picireny is a Hierarchical Delta Debugging approach, which works on parse trees created using ANTLRv4 grammars. Its advantage is that you don't need to write various parsers by hand for different formats.

Since both tools can be used from CLI and API, it should not be hard to integrate them. Getting the progress information is also possible, we use them as part of our fuzzer framework, Fuzzinator, where they also report back about their progress (i.e., about the current size of the test case under reduction).

As for the stop and resume feature, we didn't have such a use case until now, but we have some ideas in mind how that could be achieved.

renatahodovan avatar Mar 29 '19 21:03 renatahodovan