cosmic-ray
cosmic-ray copied to clipboard
Second- and higher-order mutants
There seems to be quite a lot of research which suggests that higher-order mutants (mutants containing more than one mutation) can be useful in several ways:
- They can reduce the overall number of mutants to test
- They can reduce the likelihood of mutants being equivalent mutants
Some of the suggested mutation combination algorithms are very simple and reported to be effective.
This would seem to suggest that less, but more useful, work would be done by Cosmic Ray.
Further reading:
Have a look at #197 (which links to at least one of those papers!) for some earlier discussion about this. I think it's well within our grasp; the architecture should support it pretty straightforwardly. In fact, the recent change from command-line options to configuration files directly addresses one of the design issues I raised.
I agree that higher order mutation provides some benefits, but it has disadvantages as well, IMHO, it will make the logic behind each mutation much more complex in order to provide some feedback for the mechanism of understanding if the mutation is similar to another. Assuming that the program is divided or has a proper architecture then second order and higher order mutation would not be very beneficial, but I am quite hopeful I am wrong
Assuming that the program is divided or has a proper architecture then second order and higher order mutation would not be very beneficial
This is interesting! Do you have some intuition that higher-order mutations are less useful on well-architected systems? I'm really curious to hear more about this.
Assume the system has 10 components. Each component can interact with each other by a well defined API. Hence, no matter what happens in one component, the other parts will stay function as expected(hopefully). The mutations should be done only on one of the components because when I do it on 2 it has no implications since the protocol between them will not change. If I reduce this problem to the problem of a single program and create well defined interfaces, then the mutation would be only on a single aspect of the component.
At least that's my intuition, I treat each of the component as an orthogonal component in space, then changing it will not affect the other ones. otherwise, we break the rule of single responsibility. Important note: assume all standard input and output are checked as part of the test suite. Does it make sense?
I think I understand what you're saying, but I don't fully agree with the conclusions. If the system I'm testing consists of multiple components communicating via some protocol (over the wire, over stdin/out, or whatever), I can't just assume that "the protocol between them will not change" under mutation. That's actually something I very definitely need to test, and using mutation testing to validate my tests of that protocol is critical.
With that said, if you want to test your components in isolation, I think CR has sufficient support for testing individual packages right now. So it might be that higher-order mutants and the component-oriented issues you're concerned with are largely orthogonal.
This is going to require a lot of thought...
it will make the logic behind each mutation much more complex
Won't the diffs tell us a lot about mutation similarity? My (perhaps naive) view of HOMs is that they're essentially multiple mutations happening at one time. As such, we can understand them at the code level in the same way we understand single mutations.
I agree with the critique on the well defined protocol. Unfortunately, usually with software, that is not the case. If you assume it is, then components are orthogonal, if it is not well defined... we have job:)
There are many ways to implement HOM, but I think that even with the naive approach, in order to verify that one HOM is different than another HOM, one must get feedback from local resolution in order to compare among HOMs.
I’d like to put some work into this as I have time. I think an obvious first step is to update the WorkDB so that individual WorkItems can refer to multiple mutations; right now they’re limited to exactly one. This change in itself shouldn’t be too hard.
We’d then need to update the distributor protocol to use these new WorkItems. This won’t be hard for ‘local’, but the ‘http’ protocol will need updates to its serialization structures. Perhaps we can use something like fastapi to manage that serialization.
Finally, we’ll need to update the distributors to apply multiple mutations. This should be straightforward. I don’t think we’ll need to do anything regarding return values and so forth...the existing ‘result’ protocol should be fine for higher-order mutants.
As an aside, the ability to apply zero mutations in a WorkItem might streamline the baseline feature. Currently it has to jump through some hoops to apply a NoOp mutation; a zero-mutation workitem seems like it would represent the same thing.
Once this foundational infrastructure is in place, we need to think about the harder, higher-level problems. How will we decide which higher-order mutations to do? We currently don’t have any notion of a configurable strategy for deciding which mutations to do. We simply apply all available mutations maximally, creating a WorkItem for each.
Filter programs actually do give us control over which mutations are run, so maybe they’re the way to go. That is, after init discovers all of the available individual mutations, a HOM filter could go through it, aggregating them into HOMs as it sees fit. Does the WorkDB currently contain enough information for a filter like this to really work? Will it be too inefficient? We may find that we need to insert the HOM-discovery in the init phase. Likewise, it may be smarter to simply have Operators that generate HOMs; the ability for operators to produce multi-mutation WorkItems seems useful and elegant.