dapptools
dapptools copied to clipboard
`dapp mutate`
Description
This adds a simple mutation testing framework to dapptools based on universalmutator.
Mutation testing can be thought of as a kind of reverse fuzzing, where instead of generating random inputs to our test functions, we make random mutations (i.e. introduce bugs) to the source code of the system under test, and then run the test suite to see if the mutation is detected.
The idea is that mutations that are undetected should highlight behaviours that are not well specified by the test suite and hopefully provide some insight to drive improvements to the test suite (or to uncover areas that should be targeted for special attention during audit).
I have been using dss as a testbed while developing, and despite the generally high coverage of the test suite there, dapp mutate was able to uncover many bugs that would have slipped through the test suite (e.g. subtle changes to behaviour in math functions, flipped / altered conditions in important require statements).
The workflow currently looks like this:
- Run
dapp mutate gen. This will iterate over every non test file underDAPP_SRCand generate mutated versions. Solc is invoked as a part of this process to ensure that the mutated version compiles. - Run
dapp mutate filter. This will iterate over every generated mutant and check to see if applying the mutation would be detected by the test suite.
While dapp mutate filter is running, users can call dapp mutate status to get an overview of the current progress, or dapp mutate show-diffs to see the mutations that were not detected by the test suite.
Filter in particular is especially time consuming, and it's probably worth running it overnight or perhaps disabling some particulary time consuming tests. This seems acceptable since users will probably not want to run dapp mutate on every commit / change, but would rather run the analysis periodically to check the status of their test suite.
Still needs docs / changelogs and maybe some tests before merge.
Checklist
- [ ] tested locally
- [ ] added automated tests
- [ ] updated the docs
- [ ] updated the changelog
wowwww very cool
I'm wondering if it might be a good idea to break dapp mutate gen, dapp mutate show-diff and dapp mutate filter into separate files, and then make a general convenient dapp mutate --iterations=<number> command which does the generation followed by the filtering, with a flag showing the diff.
ah yes true that would probably be more convenient. We're a little constrained by the interface provided by universalmutator unfortunately (which doesn't e.g. allow for the generation of a specific number of mutants), but I'll have a play around and see if I can make it work...
Filter in particular is especially time consuming, and it's probably worth running it overnight or perhaps disabling some particulary time consuming tests. This seems acceptable since users will probably not want to run dapp mutate on every commit / change, but would rather run the analysis periodically to check the status of their test suite.
ah yes true that would probably be more convenient. We're a little constrained by the interface provided by universalmutator unfortunately (which doesn't e.g. allow for the generation of a specific number of mutants), but I'll have a play around and see if I can make it work...
@gakonst something something rust re-write candidate?
@d-xo very exciting work - thank you for spearheading this. @wminshew it's possible but I haven't done any research on that yet.
which doesn't e.g. allow for the generation of a specific number of mutants
Not sure if you misunderstood me or not, but to clarify: with --iterations I was referring to the number of times the process of generating mutants and filtering them would be repeated, not the number of mutants created.
@gakonst something something rust re-write candidate?
my guess is filtering is expensive purely because it goes over a large number of mutates and runs a bunch of tests, so the only way to seriously speed it up would be to make every test run faster -- which probably isn't a rust rewrite candidate?
(this is the author of universalmutator chiming in)
which doesn't e.g. allow for the generation of a specific number of mutants
hmm. one thing to do, that might be better than total random selection, is to 1) generate mutants then 2) prioritize them and cut off at N? analyze_mutants doesn't take an N argument to run N mutants, but it can take a file, and prioritize takes an optional N and can return only the top N mutants, and dump those into a file
@gakonst something something rust re-write candidate?
my guess is filtering is expensive purely because it goes over a large number of mutates and runs a bunch of tests, so the only way to seriously speed it up would be to make every test run faster -- which probably isn't a rust rewrite candidate?
(this is the author of universalmutator chiming in)
I am admittedly out of my league here but my interpretation of this thread/demo was that tests already ~are running significantly faster in the [incomplete] rust version? happy to be corrected & have a better understanding of the interplay going forward !
In that case, yes, speeding up tests really really helps, since you basically get N mutants * (all (relevant) tests runtime)
If you break on test failure, of course, that's a maximum and gets better the sooner you detect failures!
The main issue for generating only N mutants in universalmutator is that right now they would be for the first <N lines only. I think I could add a mode to randomize the line processing order, if that would be useful?
Note that if you're willing to call the mutator many times, one undocumented (and not well tested!) trick is to call the mutator with --fuzz and you will get "back" (at most) one mutant, in fuzz.out. This was for other purposes, but is a quick and dirty way right now to get a random mutant on demand.
e.g., to get (maybe) one mutant of a dumb C program, hello.c:
mutate hello.c --cmd "clang hello.c" --fuzz
it'll make one mutant, and if that mutant is invalid, you won't get anything, but if it is valid, you'll get it in fuzz.out
Hi @agroce! Thanks so much for the input (and for building this very nice tool in the first place) :sparkling_heart:
e.g., to get (maybe) one mutant of a dumb C program, hello.c:
`mutate hello.c --cmd "clang hello.c" --fuzz
it'll make one mutant, and if that mutant is invalid, you won't get anything, but if it is valid, you'll get it in fuzz.out
This seems like exactly what we need, I'll play around with this and see if I can make it work :pray:
hmm. one thing to do, that might be better than total random selection, is to 1) generate mutants then 2) prioritize them and cut off at N? analyze_mutants doesn't take an N argument to run N mutants, but it can take a file, and prioritize takes an optional N and can return only the top N mutants, and dump those into a file
I hadn't noticed the prioritize command before, what are the heurisitcs used to prioritize one mutant above another?
Complicated -- a mix of location, nature of the change, and the before/after code. It's basically, right now, an ad hoc mess! But it seems somewhat useful, and I have an active NSF grant with @clegoues to make it more principled.
which doesn't e.g. allow for the generation of a specific number of mutants
Not sure if you misunderstood me or not, but to clarify: with
--iterationsI was referring to the number of times the process of generating mutants and filtering them would be repeated, not the number of mutants created.
@MrChico Not sure I quite understand what you mean here. What exactly would happen on each iteration?
I was thinking one iteration is one round of mutate, filter, and that one may do multiple of those. But maybe its not more useful to do mutate, filter, mutate, filter... than to just create more mutations in the first place