cargo-mutants Incremental runs

This is a work in progress PR to add incremental runs (#53). @jared-norris is working with me on this as a part of a hackathon at our company.

TODO

[ ] Add tests
[x] Write the logic in the new incremental module
[ ] Write user facing doc

High level description

Our vision of the use case for incremental builds is that you will add some code, try to cover it and you will have, say, 10 unviable mutants, 7 caught mutants and 3 uncaught mutants. You will then add tests, trying to catch those 3 mutants and you will run cargo mutants multiple time, catching more and more, until all mutants are either unviable or caught. That last part, where you keep adding tests, is where there is an opportunity to do incremental builds, where only the 3 uncaught mutants will be tried, ignoring the 10 unviable and the 7 caught mutants.

We are adding a flag to run cargo mutants incrementally.

The first time the tool is called with the incremental flag, we do a normal run and save a list to mutants that were caught by tests or unviable. We call the scenarios associated with those mutants positive outcomes.
The second time onward the tool is called with incremental flag, we load the previously saved list of positive outcomes and use it to exclude all mutants that were previously caught or unviable. If any more mutants are caught, they are added to the list of positive outcomes and all the positive outcomes, both from the current run and from previous runs are saved back.
The next time the tool is called without the incremental flag, the list of positive outcomes is not written and it gets removed as a part of the process that rotates mutants.out to mutants.out.old.

Nov 30 '23 16:11 cl3joly

That sounds like a reasonable high level plan.

This seems like a feature where it would be good to write the user facing doc in book/src fairly early, starting from your description here, so we can see if it's easy to explain.

Dec 01 '23 00:12 sourcefrog

Actually, it makes more sense to just remove the incremental file once we have every mutant resulting in a positive outcome.

Dec 01 '23 16:12 cl3joly

High level description

Our vision of the use case for incremental builds is that you will add some code, try to cover it and you will have, say, 10 unviable mutants, 7 caught mutants and 3 uncaught mutants. You will then add tests, trying to catch those 3 mutants and you will run cargo mutants multiple time, catching more and more, until all mutants are either unviable or caught. That last part, where you keep adding tests, is where there is an opportunity to do incremental builds, where only the 3 uncaught mutants will be tried, ignoring the 10 unviable and the 7 caught mutants.

This sounds just great.

I wonder if the "incremental" name is going to be confusing with the --in-diff feature of looking for mutants in code added in a PR.

I can't immediately think of a better name, although maybe it should just be more explicit, something like "--already-failing", though that's still not great.

We are adding a flag to run cargo mutants incrementally.

The first time the tool is called with the incremental flag, we do a normal run and save a list to mutants that were caught by tests or unviable. We call the scenarios associated with those mutants positive outcomes.

The second time onward the tool is called with incremental flag, we load the previously saved list of positive outcomes and use it to exclude all mutants that were previously caught or unviable. If any more mutants are caught, they are added to the list of positive outcomes and all the positive outcomes, both from the current run and from previous runs are saved back.

The next time the tool is called without the incremental flag, the list of positive outcomes is not written and it gets removed as a part of the process that rotates mutants.out to mutants.out.old.

Sounds good.

Dec 04 '23 15:12 sourcefrog

High level description

Our vision of the use case for incremental builds is that you will add some code, try to cover it and you will have, say, 10 unviable mutants, 7 caught mutants and 3 uncaught mutants. You will then add tests, trying to catch those 3 mutants and you will run cargo mutants multiple time, catching more and more, until all mutants are either unviable or caught. That last part, where you keep adding tests, is where there is an opportunity to do incremental builds, where only the 3 uncaught mutants will be tried, ignoring the 10 unviable and the 7 caught mutants.

This sounds just great.

I wonder if the "incremental" name is going to be confusing with the --in-diff feature of looking for mutants in code added in a PR.

I can't immediately think of a better name, although maybe it should just be more explicit, something like "--already-failing", though that's still not great.

We are adding a flag to run cargo mutants incrementally.

The first time the tool is called with the incremental flag, we do a normal run and save a list to mutants that were caught by tests or unviable. We call the scenarios associated with those mutants positive outcomes.

The second time onward the tool is called with incremental flag, we load the previously saved list of positive outcomes and use it to exclude all mutants that were previously caught or unviable. If any more mutants are caught, they are added to the list of positive outcomes and all the positive outcomes, both from the current run and from previous runs are saved back.

The next time the tool is called without the incremental flag, the list of positive outcomes is not written and it gets removed as a part of the process that rotates mutants.out to mutants.out.old.

Sounds good.

We had a similar debate when choosing the name; I think it can be improved upon.

Maybe --uncaught-only?

Dec 06 '23 16:12 jared-norris

Maybe --iterate-uncaught to emphasize that it does this special behaviour of selecting from the previous output?

Dec 06 '23 18:12 sourcefrog

Hello, would this be possible on the txt files alone or would it require the jsons? Or would it be another mechanism to track these?

Dec 14 '23 22:12 ASuciuX

Hello, would this be possible on the txt files alone or would it require the jsons? Or would it be another mechanism to track these?

Can you say more about why you'd want that? I think jsons are probably a better thing for computers to read: we can add more fields in future, etc.

Dec 15 '23 10:12 sourcefrog

Let me know if/when you want a code review

Dec 16 '23 19:12 sourcefrog

Our existing workflow consists of two phases: Logging and Tracking.

Tracking Phase In the Tracking phase, the process unfolds as follows:

Upon the opening of a pull request or the introduction of new commits on the source branch, we execute cargo mutants on the changes between the latest commit and the current code.
This approach focuses on testing mutants exclusively on newly added or modified functions.
The CI workflow fails if there are any uncaught mutants, giving an output from the respective .txt files about which functions need to be changed or tested.

Logging Phase The Logging phase starts with an initial output, consisting solely of .txt files generated by running mutants on the latest commit in the source code. The subsequent steps are run on merges on certain branches, and they include:

Uploading the initial output files for each package from the codebase to a GitHub Actions cache, along with the hash of the commit.
Restoring the cache and executing cargo mutants solely on the differences between the head and the hash from the restored cache.
Extracting output from the mutants.out folder and appending it to the existing cache content using operations like grep, sed, or awk.
Re-uploading the refined output to both cache and artifacts.

This logging approach ensures that mutant outputs remain human-readable and easily accessible. Simultaneously, it optimizes cargo mutants runs for efficient utilization of CI capabilities.

References PR Tracking

https://github.com/stacks-network/stacks-core/issues/4176
https://github.com/stacks-network/actions/pull/4

Logging

https://github.com/stacks-network/stacks-core/issues/4177
https://github.com/stacks-network/actions/pull/5

The current struggle is related only to the stackslib package. Currently, we have been running mutants for 460 minutes on 2 VMs in a certain folder from stackslib package containing 1028 mutants - out of 11,448 from the whole package. There are 1634 minutes remaining with the -j 30 flag, and 1992 with -j 24. We are taking into consideration how much it would take to build and test small changes with 1-2 jobs on the CI on the differences - given the workflow doesn't fail to begin with. We use shell scripts for parsing diff files and appending lines to output .txt files. Thank you for taking the time to look through this, and if you have any suggestions, please let us know what could be improved.

Dec 17 '23 02:12 ASuciuX

Hey @sourcefrog , sorry for the long silence, we have been quite busy lately.

Some of the TODO points are still unimplemented, but feel free to share any feedback you may have on the current state.

Jan 03 '24 20:01 cl3joly

Ack, I'll come back and have a look at this

Jan 11 '24 05:01 sourcefrog

Hi @cl3joly, thanks for starting on this, it's very nice and encouraging to see the energy about Mutants and about this feature.

I think the high level description in your original post is just right: once people have some missed mutants, they should be able to repeatedly run with --iterate and it will re-test only the mutants that were not know to be caught on previous runs. (Eventually, when you think you've fixed everything, maybe in CI, you'll probably want to retest everything to make sure no new gaps were introduced.)

I realized that, in doing these iterations it's pretty likely that the user might interrupt one cargo-mutants run and it's important that, when that happens, we don't forget about any known positive results.

The other thing I thought of looking at this PR thread is that as you edit the code to fix gaps, it's fairly likely that some mutant line numbers will change, so we probably don't want to match on them in saying which mutants are already covered: probably just matching on the type/function name is enough.

I think, looking at the implementation, it is maybe getting a bit little complex than is necessary, although that is totally understandable for someone new to the codebase. I think @ASuciuX was on the right track in https://github.com/sourcefrog/cargo-mutants/pull/174#issuecomment-1856879607 to say maybe we can do this just with the existing text files in mutants.out.

(In fact, maybe a good initial stand-alone step would be to add an option that runs only mutants explicitly named in some file, or excludes mutants named in some other files, optionally disregarding line/col. That's basically #204. That would support the use case someone mentioned of wanting to do some external filtering and processing of which ones to run. I'm inclined to split that off as a standalone feature, as a kind of step beyond the regex based filters.)

One complication, maybe handled in your PR, I'm not sure, is that even if we're interrupted we shouldn't forget that some things were already caught or unviable...

Jan 20 '24 03:01 sourcefrog

@sourcefrog Thanks for the review!

The other thing I thought of looking at this PR thread is that as you edit the code to fix gaps, it's fairly likely that some mutant line numbers will change, so we probably don't want to match on them in saying which mutants are already covered: probably just matching on the type/function name is enough.

That’s a great point.

I think, looking at the implementation, it is maybe getting a bit little complex than is necessary, although that is totally understandable for someone new to the codebase. I think @ASuciuX was on the right track in https://github.com/sourcefrog/cargo-mutants/pull/174#issuecomment-1856879607 to say maybe we can do this just with the existing text files in mutants.out.

Right, that makes sense.

The above point requires quite an extensive rework of this PR. That’s a fair thing from you to ask, as a maintainer. However, to be transparent, I don’t think I’ll have time to work on it in the next few weeks. If anyone wants to pick this up, please feel free. If not, when I have some time, I’ll come back to this PR to improve it based on your feedback. I hope that works for you @sourcefrog

Jan 31 '24 14:01 cl3joly

Yep, understandable. If I have free time I may pick this up, based on the ideas developed in your work. If I start on it I'll comment here to avoid duplication, and you can do that too if you come back to it.

Jan 31 '24 14:01 sourcefrog

I'll close this and continue in #354. Thanks again for getting it started.

Jun 11 '24 14:06 sourcefrog