rust-payjoin Introduce mutation testing

We should look into introducing mutation testing to our workflow by way of cargo-mutants. This somewhat relates to #339.

I installed it locally and found 269 missed mutants in the core payjoin directory only. Sample output:

❯ cargo mutants -f payjoin -j2
Found 731 mutants to test
ok       Unmutated baseline in 91.9s build + 1.3s test
 INFO Auto-set test timeout to 20s
MISSED   payjoin/src/receive/v1/mod.rs:531:9: replace WantsInputs::receiver_min_input_amount -> Amount with Default::default() in 1.9s build + 1.9s test
MISSED   payjoin/src/receive/v1/mod.rs:722:9: replace ProvisionalProposal::sender_input_indexes -> Vec<usize> with vec![] in 1.6s build + 1.0s test
MISSED   payjoin/src/receive/v1/exclusive/error.rs:38:48: replace <impl From for super::ReplyableError>::from -> Self with Default::default() in 1.3s build + 1.0s test
MISSED   payjoin/src/psbt/mod.rs:151:38: replace == with != in InternalInputPair<'_>::validate_utxo in 1.3s build + 1.0s test
MISSED   payjoin/src/receive/v1/mod.rs:310:29: replace && with || in WantsOutputs::replace_receiver_outputs in 1.3s build + 1.0s test
MISSED   payjoin/src/receive/v2/mod.rs:542:9: replace PayjoinProposal::process_res -> Result<(), Error> with Ok(()) in 1.3s build + 1.0s test
MISSED   payjoin/src/receive/v1/mod.rs:639:50: replace + with - in ProvisionalProposal::apply_fee in 1.3s build + 1.0s test
MISSED   payjoin/src/receive/v1/mod.rs:780:9: replace PayjoinProposal::is_output_substitution_disabled -> bool with false in 1.2s build + 1.0s
...
MISSED   payjoin/src/send/mod.rs:262:53: replace == with != in PsbtContext::check_outputs in 1.3s build + 1.0s test
MISSED   payjoin/src/receive/v1/mod.rs:100:39: replace < with > in UncheckedProposal::check_broadcast_suitability in 1.7s build + 3.7s test
MISSED   payjoin/src/receive/v1/mod.rs:635:33: replace += with -= in ProvisionalProposal::apply_fee in 1.4s build + 3.0s test
MISSED   payjoin/src/send/error.rs:335:9: replace <impl Display for ResponseError>::fmt -> fmt::Result with Ok(Default::default()) in 1.4s build + 0.9s test
MISSED   payjoin/src/receive/v1/mod.rs:381:56: replace < with > in interleave_shuffle in 1.4s build + 0.9s test
MISSED   payjoin/src/receive/v1/mod.rs:462:74: replace + with - in WantsInputs::avoid_uih in 1.4s build + 0.9s test
MISSED   payjoin/src/send/error.rs:132:9: replace <impl Error for ValidationError>::source -> Option<&(dyn std::error::Error +'static)> with None in 1.3s build + 0.9s test
731 mutants tested in 11m 48s: 269 missed, 129 caught, 329 unviable, 4 timeouts

rust-bitcoin recently introduced this to their own workflow, so we can take inspiration from there. Ideally we should start with the smallest/most critical subset of the codebase, kill all mutants there, and slowly expand to cover the rest.

One drawback is that cargo-mutants doesn't cover declarative macros, so the critical ensure! checks in send/mod.rs are not covered currently. Maybe we could rewrite those as procedural macros or plain functions to ensure they are covered properly.

Feb 14 '25 16:02 spacebear21

Love this idea but have no experience with mutation testing to offer

Feb 18 '25 21:02 DanGould

i was planning to implement mutant testing for the payjoin/src/send or payjoin/src/recieve components first but they both have around 735 mutants, so should i break it into smaller components and make PR ? or do you have any component in mind that needs this implementation first

cc @spacebear21

Feb 19 '25 19:02 Gmin2

We should definitely narrow the scope down to a reasonable number to start, kill those mutants, then gradually expand from there. IMO a good place to start would be the sender basic_checks, check_inputs, check_outputs, and check_fees functions (but note the issue with ensure! I mentioned in the top comment), or the individual receiver typestates (e.g UncheckedProposal, MaybeInputsOwned...). I think this can be achieved by combining file filters and regex filters with the cargo mutants command https://mutants.rs/filter_mutants.html.

Feb 19 '25 19:02 spacebear21

Looking into it !

Feb 20 '25 22:02 Gmin2

Playing around with it I made a minimal check on my fork just checking send/v2 https://github.com/benalleng/rust-payjoin/blob/mutants-out/mutants.out/missed.txt. But ultimately I am wondering how we should organize the outputs for tracking these issues. It seems like there was similar questions on rust bitcoin on where to put these outputs for actionable tracking. Also a question does "unviable" like these here https://github.com/benalleng/rust-payjoin/blob/mutants-out/mutants.out/unviable.txt indicate functions we need to whitelist? The docs https://mutants.rs/using-results.html?highlight=unviable#mutant-outcomes were unclear to me.

Feb 21 '25 20:02 benalleng

IIUC "unviable" just means it fails to compile with the attempted mutation, meaning the Rust compiler effectively tests that behavior for us so these are safe to ignore.

Feb 26 '25 17:02 spacebear21

How do we feel about including the flag test_workspace = true in our mutants workflow? This would include tests from all of our crates within the repo as laid out here https://mutants.rs/workspaces.html#selecting-tests-to-run. It would increase the time for our mutants to run but would substantially increase our coverage with the inclusion of tests outside of the payjoin crate alone.

Doing some manual testing I found that we do have tests caught by our payjoin-cli e2e tests quite often when a mutation occurs but these tests aren't included when running cargo mutants as it is configured now.

Based on my initial testing the time-sink for this inclusion is dramatic so there is a reasonable argument to not add it here and instead try to maximize our coverage just within the payjoin crate alone. I will include a updated baseline mutation run alongside a run with this flag included.

May 30 '25 14:05 benalleng

Baseline mutation test on all of payjoin/src

Found 533 mutants to test
ok       Unmutated baseline in 26.3s build + 16.3s test
 INFO Auto-set test timeout to 1m 22s
MISSED   payjoin/src/send/mod.rs:360:23: replace match guard with false in 9.6s build + 26.8s test
MISSED   payjoin/src/error_codes.rs:32:13: delete match arm in 10.7s build + 29.5s test
MISSED   payjoin/src/send/v1.rs:93:47: replace == with != in SenderBuilder<'a>::build_recommended in 79.5s bu
ild + 19.4s test
...
MISSED   payjoin/src/into_url.rs:72:32: replace <impl IntoUrlSealed for &str>::as_str -> &str with "xyzzy" in
 3.0s build + 26.7s test
MISSED   payjoin/src/into_url.rs:84:32: replace <impl IntoUrlSealed for String>::as_str -> &str with "xyzzy" 
in 3.0s build + 29.0s test
MISSED   payjoin/src/directory.rs:27:39: replace ShortId::as_slice -> &[u8] with Vec::leak(vec![1]) in 3.9s b
uild + 23.8s test
533 mutants tested in 23m 10s: 71 missed, 264 caught, 198 unviable

With test_workspace = true

Found 533 mutants to test
ok       Unmutated baseline in 24.8s build + 15.7s test
 INFO Auto-set test timeout to 1m 19s
533 mutants tested in 29m 44s: 335 caught, 198 unviable

Note the time difference is not as significant as I had expected. It's possible this is not worth the added time if we can just get coverage on these additional areas from within the payjoin crate. A good portion of these are SenderBuilder::build_recommended() which has 19 missed mutants alone! At the very least it is nice to see that we do have complete coverage when using tests from our entire workspace.

May 30 '25 15:05 benalleng

Wow, not a single missed mutant when including e2e tests?? That's almost hard to believe...

My hunch is that we should not rely on crates external to payjoin for mutation testing within payjoin. Do you have a sense of what areas of the code are caught by e2e tests, that aren't caught by unit/integration tests?

May 30 '25 17:05 spacebear21

Per https://github.com/payjoin/rust-payjoin/pull/748, I think we can safely consider mutation testing "introduced" :p

Thanks @benalleng for your work on this!

Jun 12 '25 20:06 spacebear21