Introduce mutation testing
We should look into introducing mutation testing to our workflow by way of cargo-mutants. This somewhat relates to #339.
I installed it locally and found 269 missed mutants in the core payjoin directory only. Sample output:
❯ cargo mutants -f payjoin -j2
Found 731 mutants to test
ok Unmutated baseline in 91.9s build + 1.3s test
INFO Auto-set test timeout to 20s
MISSED payjoin/src/receive/v1/mod.rs:531:9: replace WantsInputs::receiver_min_input_amount -> Amount with Default::default() in 1.9s build + 1.9s test
MISSED payjoin/src/receive/v1/mod.rs:722:9: replace ProvisionalProposal::sender_input_indexes -> Vec<usize> with vec![] in 1.6s build + 1.0s test
MISSED payjoin/src/receive/v1/exclusive/error.rs:38:48: replace <impl From for super::ReplyableError>::from -> Self with Default::default() in 1.3s build + 1.0s test
MISSED payjoin/src/psbt/mod.rs:151:38: replace == with != in InternalInputPair<'_>::validate_utxo in 1.3s build + 1.0s test
MISSED payjoin/src/receive/v1/mod.rs:310:29: replace && with || in WantsOutputs::replace_receiver_outputs in 1.3s build + 1.0s test
MISSED payjoin/src/receive/v2/mod.rs:542:9: replace PayjoinProposal::process_res -> Result<(), Error> with Ok(()) in 1.3s build + 1.0s test
MISSED payjoin/src/receive/v1/mod.rs:639:50: replace + with - in ProvisionalProposal::apply_fee in 1.3s build + 1.0s test
MISSED payjoin/src/receive/v1/mod.rs:780:9: replace PayjoinProposal::is_output_substitution_disabled -> bool with false in 1.2s build + 1.0s
...
MISSED payjoin/src/send/mod.rs:262:53: replace == with != in PsbtContext::check_outputs in 1.3s build + 1.0s test
MISSED payjoin/src/receive/v1/mod.rs:100:39: replace < with > in UncheckedProposal::check_broadcast_suitability in 1.7s build + 3.7s test
MISSED payjoin/src/receive/v1/mod.rs:635:33: replace += with -= in ProvisionalProposal::apply_fee in 1.4s build + 3.0s test
MISSED payjoin/src/send/error.rs:335:9: replace <impl Display for ResponseError>::fmt -> fmt::Result with Ok(Default::default()) in 1.4s build + 0.9s test
MISSED payjoin/src/receive/v1/mod.rs:381:56: replace < with > in interleave_shuffle in 1.4s build + 0.9s test
MISSED payjoin/src/receive/v1/mod.rs:462:74: replace + with - in WantsInputs::avoid_uih in 1.4s build + 0.9s test
MISSED payjoin/src/send/error.rs:132:9: replace <impl Error for ValidationError>::source -> Option<&(dyn std::error::Error +'static)> with None in 1.3s build + 0.9s test
731 mutants tested in 11m 48s: 269 missed, 129 caught, 329 unviable, 4 timeouts
rust-bitcoin recently introduced this to their own workflow, so we can take inspiration from there. Ideally we should start with the smallest/most critical subset of the codebase, kill all mutants there, and slowly expand to cover the rest.
One drawback is that cargo-mutants doesn't cover declarative macros, so the critical ensure! checks in send/mod.rs are not covered currently. Maybe we could rewrite those as procedural macros or plain functions to ensure they are covered properly.
Love this idea but have no experience with mutation testing to offer
i was planning to implement mutant testing for the payjoin/src/send or payjoin/src/recieve components first but they both have around 735 mutants, so should i break it into smaller components and make PR ? or do you have any component in mind that needs this implementation first
cc @spacebear21
We should definitely narrow the scope down to a reasonable number to start, kill those mutants, then gradually expand from there. IMO a good place to start would be the sender basic_checks, check_inputs, check_outputs, and check_fees functions (but note the issue with ensure! I mentioned in the top comment), or the individual receiver typestates (e.g UncheckedProposal, MaybeInputsOwned...). I think this can be achieved by combining file filters and regex filters with the cargo mutants command https://mutants.rs/filter_mutants.html.
Looking into it !
Playing around with it I made a minimal check on my fork just checking send/v2 https://github.com/benalleng/rust-payjoin/blob/mutants-out/mutants.out/missed.txt. But ultimately I am wondering how we should organize the outputs for tracking these issues. It seems like there was similar questions on rust bitcoin on where to put these outputs for actionable tracking. Also a question does "unviable" like these here https://github.com/benalleng/rust-payjoin/blob/mutants-out/mutants.out/unviable.txt indicate functions we need to whitelist? The docs https://mutants.rs/using-results.html?highlight=unviable#mutant-outcomes were unclear to me.
IIUC "unviable" just means it fails to compile with the attempted mutation, meaning the Rust compiler effectively tests that behavior for us so these are safe to ignore.
How do we feel about including the flag test_workspace = true in our mutants workflow? This would include tests from all of our crates within the repo as laid out here https://mutants.rs/workspaces.html#selecting-tests-to-run. It would increase the time for our mutants to run but would substantially increase our coverage with the inclusion of tests outside of the payjoin crate alone.
Doing some manual testing I found that we do have tests caught by our payjoin-cli e2e tests quite often when a mutation occurs but these tests aren't included when running cargo mutants as it is configured now.
Based on my initial testing the time-sink for this inclusion is dramatic so there is a reasonable argument to not add it here and instead try to maximize our coverage just within the payjoin crate alone. I will include a updated baseline mutation run alongside a run with this flag included.
Baseline mutation test on all of payjoin/src
Found 533 mutants to test
ok Unmutated baseline in 26.3s build + 16.3s test
INFO Auto-set test timeout to 1m 22s
MISSED payjoin/src/send/mod.rs:360:23: replace match guard with false in 9.6s build + 26.8s test
MISSED payjoin/src/error_codes.rs:32:13: delete match arm in 10.7s build + 29.5s test
MISSED payjoin/src/send/v1.rs:93:47: replace == with != in SenderBuilder<'a>::build_recommended in 79.5s bu
ild + 19.4s test
...
MISSED payjoin/src/into_url.rs:72:32: replace <impl IntoUrlSealed for &str>::as_str -> &str with "xyzzy" in
3.0s build + 26.7s test
MISSED payjoin/src/into_url.rs:84:32: replace <impl IntoUrlSealed for String>::as_str -> &str with "xyzzy"
in 3.0s build + 29.0s test
MISSED payjoin/src/directory.rs:27:39: replace ShortId::as_slice -> &[u8] with Vec::leak(vec![1]) in 3.9s b
uild + 23.8s test
533 mutants tested in 23m 10s: 71 missed, 264 caught, 198 unviable
With test_workspace = true
Found 533 mutants to test
ok Unmutated baseline in 24.8s build + 15.7s test
INFO Auto-set test timeout to 1m 19s
533 mutants tested in 29m 44s: 335 caught, 198 unviable
Note the time difference is not as significant as I had expected. It's possible this is not worth the added time if we can just get coverage on these additional areas from within the payjoin crate. A good portion of these are SenderBuilder::build_recommended() which has 19 missed mutants alone! At the very least it is nice to see that we do have complete coverage when using tests from our entire workspace.
Wow, not a single missed mutant when including e2e tests?? That's almost hard to believe...
My hunch is that we should not rely on crates external to payjoin for mutation testing within payjoin. Do you have a sense of what areas of the code are caught by e2e tests, that aren't caught by unit/integration tests?
Per https://github.com/payjoin/rust-payjoin/pull/748, I think we can safely consider mutation testing "introduced" :p
Thanks @benalleng for your work on this!