Optimizing weights through the MAB algorithm
Hello, I've noticed that in Fuzzilli, Mutators, CodeGenerators, and Templates all use fixed weights for selection, which I believe is not suitable. Here are some experiments I've done: the probability of each Mutator generating an interesting program changes over time.
It can be observed that the CombineMutator and SpliceMutator have the highest interesting rates, but these two mutators are not assigned higher weights. Therefore, I want to optimize the weights of the mutators through the MAB algorithm. I've implemented the related code locally. In short:
- Selecting a Mutator is considered as a pull. If the Mutator generates an interesting program, it is considered a profit.
- The Thompson sampling algorithm is used to estimate the Mutator with the highest interesting rate and select it more times.
- A MABList<Element> is created to replace the original WeightedList<Element>. The MABList<Element> will automatically select the appropriate elements based on feedback.
I would like to know if you would accept this PR if I were to propose it?
I'd claim that there is a strong correlation between the amount of work a mutator does and its chance to increase coverage and therefore to create an "interesting" result.
Therefore, it seems very understandable that the CombineMutator and the SpliceMutator end up having the highest "interesting rate" because they do the most complex transformations where a large amount of statements is inserted into the program vs. the InputMutator that just swaps a single input.
I'm not sure however, if that means that we should draw the conclusion that we should rank these more complex mutators higher because of this higher chance of increasing coverage. They also significantly increase the size of the test case that then needs to be minimized again, time on which we might be able to run quite a few input or operation mutations. So in theory what we'd want to compare is not the interesting rate but the "cost to run, evaluate and minimize the mutation result vs. the interesting rate"?
Another aspect: Taking e.g. the non-type-aware InputMutator: It performs modifications that are in many cases "wrong" based on Fuzzilli's type information. Therefore it will most likely after a few similar attempts not increase coverage and instead just trigger some error handling during decoding. However, it creates test scenarios that otherwise wouldn't generate otherwise as they are considered invalid, so if there is any handling missing for disallowing such a corner case early, this could lead to crashes that we'd otherwise not find. So the fixed weights make the fuzzer a bit "stubborn" and not caring that much about coverage when trying things out which might be inefficient but in some cases that might be a good thing as C++ coverage isn't actually a great metric for a JIT compiler?
The WeightedList has the other advantage of allowing to put an emphasis on any given mutator if desired.
I guess, the best option would be to have a command line flag to either use a fixed weight or use the MABList?
@carl-smith What are your thoughts?
Note that this reply is focused on the mutators. For the code generators I have a stronger opinion that I wouldn't want to focus too much on their chance of covering new edges when weighing them against each other.
I would like to add that the above chart represents the results of continuous fuzzing for 10 hours. The higher interesting rate of CombineMutator is likely due to the fact that the seeds in the initial corpus are relatively small. We can understand it this way: in the initial stage of fuzzing, merging more small programs into a larger one is more likely to increase coverage than modifying the input.
However, this does not mean that CombineMutator is always the best mutator throughout the fuzzing process: as fuzzing proceeds and the corpus gradually increases, merging two large corpora may likely lead to a timeout. In this case, InputMutator becomes more suitable.
Therefore, I believe that the weight of the mutator during the fuzzing process should be dynamically changing, which is the advantage of the MAB algorithm compared to fixed weights.