Extend mull-runner to consult SQLite DB and skip mutations already squashed by previous tests
Description
One obvious performance optimization for mull-runner when it is writing mutation results to a shared SQLite DB with arguments like:
mull-runner-19 --coverage-info=<test-name>.profdata --reporters=SQLite --report-name=core \
--report-dir=${PWD} <test-exec>
would be to have it consult the SQLite DB and avoid running mutations that have already been squashed by previous tests that have written to the shared DB.
Currently, mull-runner will run any mutations defined in <test-exec> for covered lines specified in <test-name>.profdata, even if those mutations were already squashed by a previous test as recorded in the shared SQLite DB.
If mull-runner were extended to skip mutations already squashed, it would potentially improve performance and usability for several use cases:
-
When iteratively debugging to squash surviving mutations we desire to not rerunning mutations that have already squashed in previous runs of a given test binary.
-
Where there is a lot of shared code between multiple test executables and we want to be able to run each test executable independently with
mull-runner(which is the purpose of aggregating mutation results in an SQLite DB and summarizing them withmull-reporter).
In the former use case, suppose we are working to eliminate the surviving mutations for a test executable that has 500 mutations that mull-runner runs based on the instrumented mutations in the binary and the coverage data. When we run the existing test, 490 of those 500 mutations are squashed by this test, so we only need to work to squash the remaining 10 mutations. Let's say this is an expensive test executable that takes 30s to run the test executable, and we have a machine with only 25 cores. On that machine, it would take 500/25*30s = 10m to run all 500 of those mutations. But we only need to run those 10 surviving mutations, so it should take just 30s to run all 10 of the remaining mutations in parallel. Ths would reducing the iteration cycle from 10m to 30s which would be a massive productivity boost.
For the latter case, for large, complex codebases, running mutations that have already been squashed can result in many duplicate mutation runs across many tests that could be avoided. With such a large, complex project with a large, complex test suite, you have the potential to get as many duplicate mutation runs as there are test executables. For a project like Trilinos, there are thousands of unit test driver executables, so you could have thousands of duplicate mutation runs for some shared upstream code!
Because Trilinos builds as a single large CMake project with many packages, Mull testing would be extremely inefficient, as some mull-runner runs for downstream packages could try to run hundreds of thousands of mutations duplicated across many other tests from upstream packages.
Having mull-runner skip already-squashed mutations could reduce the cost of running Mull by 10x or more for large projects like Trilinos. (I'm just guessing, because I can't yet build some object files, even for upstream packages, with the Mull compiler plugin because it hangs on some of the *.o builds. But that is an issue I need to report separately.) For Trilinos, tests for upstream packages would tend to run first and squash mutations in that upstream code. Then downstream packages that contain the same mutations would not rerun many of them, because upstream tests would have already squashed them.
With an extension like this, the ideal way to run mutation testing with Mull to minimize cost and maximize developer productivity would be to have many small unit test executables and run upstream tests before downstream tests, squashing mutations along the way. As mutations in upstream code are squashed, downstream tests that contain those same mutations would not rerun them. That is, in fact, what Trilinos already does in the way it orders its packages and tests. (And Mull tests are run with CTest with RUN_SERIAL TRUE, since they have their own parallelism and avoid concurrent DB writes that would fail. This results in a natural progression of testing upstream code before downstream code.)
NOTE: Even with an optimization like this, it would still likely not be feasible to run Mull on a large codebase like Trilinos all at once, but it would perhaps let you run Mull for all code and on O(100) tests in a single Trilinos package. Without a performance optimization like this, it is likely not possible to run Mull for even a single Trilinos package (only mutating code unique to that package), and therefore only narrower uses of Mull would be possible.
Proposed solution
One proposed solution is to add a new argument, --skip-squashed to mull-runner used like:
mull-runner-19 --coverage-info=<test-name>.profdata --reporters=SQLite --report-name=core \
--skip-squashed \
--report-dir=${PWD} <test-exec>
which will cause mull-runner to read from the core.sqlite DB file and filter out any mutations that are shown as squashed there. Reading and filtering mutations from the SQLite DB should be much faster than running the test program for each mutation that is already squashed.
It should be straightforward to query the SQLite DB, look up squashed mutations that match files in the coverage data, and then filter out the mutations that mull-runner would otherwise run. This should be is less complex to implement than the existing filtering based on coverage data.
Demonstration and discussion
To demonstrate the issues involved, I have a little project that has the C++ file getNumDaysInMonth.cpp:
#include "getNumDaysInMonth.hpp"
//#include <vector>
#include <array>
int getNumDaysInMonth(int monthID, bool isLeapYear) {
static const std::array<int,12> daysInMonthArray = {31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31};
int daysInMonth = daysInMonthArray[monthID];
if (monthID == 1 && isLeapYear) { // Feb in leap year!
daysInMonth += 1;
}
return daysInMonth;
}
I have three different Google Test unit test suites for this code that demonstrate different levels of testing and the issues involved:
getNumDaysInMonth_UnitTests_low_cov_strong: Does not cover all the code but squashes all mutations in the covered codegetNumDaysInMonth_UnitTests_full_cov: Covers all the code but has weak checks of the outputs and does not squash any mutationsgetNumDaysInMonth_UnitTests_full_cov_strong: Covers all the code, has strong checks of outputs, and squashes all mutations
I have set up my build system to produce <test-name>.profdata files for each of these test executables that contain only the coverage for that executable for that test <test-name>.
Below, I am using:
$ which mull-runner
alias mull-runner='/usr/bin/mull-runner-19'
/usr/bin/mull-runner-19
$ mull-runner --version
Mull: Practical mutation testing and fault injection for C and C++
Home: https://github.com/mull-project/mull
Docs: https://mull.readthedocs.io
Support: https://mull.readthedocs.io/en/latest/Support.html
Version: 0.27.0
LLVM: 19.1.7
Starting with a missing core.sqlite DB, we run the first test getNumDaysInMonth_UnitTests_low_cov_strong with mull-runner:
$ mull-runner --debug-coverage --reporters=SQLite --report-name core --report-dir=.. --coverage-info=getNumDaysInMonth_UnitTests_low_cov_strong.profdata ./getNumDaysInMonth_UnitTests_low_cov_stro
ng
[info] Using config /mounted_from_host/numerical_coverage_examples/BUILDS/clang-19-mull/mull.yml
[warning] Could not find dynamic library: libstdc++.so.6
[warning] Could not find dynamic library: libm.so.6
[warning] Could not find dynamic library: libgcc_s.so.1
[warning] Could not find dynamic library: libc.so.6
[info] Warm up run (threads: 1)
[################################] 1/1. Finished in 87msmull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:8:23:8:33
mull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:8:34:8:35
mull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:8:35:10:4
mull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:8:23:8:33
mull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:8:34:8:35
mull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:8:35:10:4
mull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth_UnitTests_low_cov_strong.cpp:4:43:4:43
mull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth_UnitTests_low_cov_strong.cpp:4:43:4:43
[info] Filter mutants (threads: 1)
[################################] 1/1. Finished in 0ms
[info] Baseline run (threads: 1)
[################################] 1/1. Finished in 79ms
[info] Running mutants (threads: 2)
[################################] 2/2. Finished in 101ms
[info] Results can be found at '../core.sqlite'
[info] Total execution time: 312ms
That ran two mutations and squashed both. You can see those mutations in the SQLite DB using a helper script ??? (note the =1 status for squashed mutations):
$ ../../../dump_mull_sqlite_db.sh ../core.sqlite
cxx_init_const:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:7:7:7:18=1 (95ms)
cxx_eq_to_ne:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:8:15:8:17=1 (92ms)
Now, run the test getNumDaysInMonth_UnitTests_full_cov with mull-runner:
$ mull-runner --debug-coverage --reporters=SQLite --report-name core --report-dir=.. --coverage-info=getNumDaysInMonth_UnitTests_full_cov.profdata ./getNumDaysInMonth_UnitTests_full_cov
[info] Using config /mounted_from_host/numerical_coverage_examples/BUILDS/clang-19-mull/mull.yml
[warning] Could not find dynamic library: libstdc++.so.6
[warning] Could not find dynamic library: libm.so.6
[warning] Could not find dynamic library: libgcc_s.so.1
[warning] Could not find dynamic library: libc.so.6
[info] Warm up run (threads: 1)
[################################] 1/1. Finished in 88msmull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth_UnitTests_full_cov.cpp:4:43:4:43
mull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth_UnitTests_full_cov.cpp:4:43:4:43
mull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth_UnitTests_full_cov.cpp:5:43:5:43
mull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth_UnitTests_full_cov.cpp:5:43:5:43
mull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth_UnitTests_full_cov.cpp:6:46:6:46
mull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth_UnitTests_full_cov.cpp:6:46:6:46
[info] Filter mutants (threads: 1)
[################################] 1/1. Finished in 0ms
[info] Baseline run (threads: 1)
[################################] 1/1. Finished in 79ms
[info] Running mutants (threads: 3)
[################################] 3/3. Finished in 91ms
[info] Results can be found at '../core.sqlite'
[info] Total execution time: 265ms
[info] Surviving mutants: 3
As shown above, that ran all three mutations and all of them survived (because this test has weak checks). You can see the new mutations added with:
$ ../../../dump_mull_sqlite_db.sh ../core.sqlite
cxx_init_const:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:7:7:7:18=1 (95ms)
cxx_eq_to_ne:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:8:15:8:17=1 (92ms)
cxx_init_const:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:7:7:7:18=2 (79ms)
cxx_eq_to_ne:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:8:15:8:17=2 (87ms)
cxx_add_assign_to_sub_assign:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:9:17:9:19=2 (80ms)
And if you sort them, you see the duplicates:
$ ../../../dump_mull_sqlite_db.sh ../core.sqlite | sort
cxx_add_assign_to_sub_assign:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:9:17:9:19=2 (80ms)
cxx_eq_to_ne:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:8:15:8:17=1 (92ms)
cxx_eq_to_ne:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:8:15:8:17=2 (87ms)
cxx_init_const:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:7:7:7:18=1 (95ms)
cxx_init_const:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:7:7:7:18=2 (79ms)
If mull-runner had an option --skip-squashed, then this call would not have run the following surviving mutations:
cxx_eq_to_ne:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:8:15:8:17=2 (87ms)
cxx_init_const:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:7:7:7:18=2 (79ms)
because they were already squashed in the previous test.
If the goal is to squash the maximum number of mutations at minimum cost, then we want mull-runner to skip mutations that have previously been squashed and recorded as such.
Then, if you run the next test getNumDaysInMonth_UnitTests_full_cov_strong with mull-runner, you run all of these mutations again (even the ones already squashed) with:
$ mull-runner --debug-coverage --reporters=SQLite --report-name core --report-dir=.. --coverage-info=getNumDaysInMonth_UnitTests_full_cov_strong.profdata ./getNumDaysInMonth_UnitTests_full_cov_st
rong
[info] Using config /mounted_from_host/numerical_coverage_examples/BUILDS/clang-19-mull/mull.yml
[warning] Could not find dynamic library: libstdc++.so.6
[warning] Could not find dynamic library: libm.so.6
[warning] Could not find dynamic library: libgcc_s.so.1
[warning] Could not find dynamic library: libc.so.6
[info] Warm up run (threads: 1)
[################################] 1/1. Finished in 85msmull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth_UnitTests_full_cov_strong.cpp:4:43:4:43
mull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth_UnitTests_full_cov_strong.cpp:4:43:4:43
mull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth_UnitTests_full_cov_strong.cpp:5:43:5:43
mull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth_UnitTests_full_cov_strong.cpp:5:43:5:43
mull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth_UnitTests_full_cov_strong.cpp:6:46:6:46
mull-coverage: /mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth_UnitTests_full_cov_strong.cpp:6:46:6:46
[info] Filter mutants (threads: 1)
[################################] 1/1. Finished in 0ms
[info] Baseline run (threads: 1)
[################################] 1/1. Finished in 84ms
[info] Running mutants (threads: 3)
[################################] 3/3. Finished in 91ms
[info] Results can be found at '../core.sqlite'
[info] Total execution time: 297ms
So all of those mutations were squashed, but mull-runner should have run only the last mutation. You can see the updated mutations with:
$ ../../../dump_mull_sqlite_db.sh ../core.sqlite
cxx_init_const:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:7:7:7:18=1 (95ms)
cxx_eq_to_ne:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:8:15:8:17=1 (92ms)
cxx_init_const:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:7:7:7:18=2 (79ms)
cxx_eq_to_ne:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:8:15:8:17=2 (87ms)
cxx_add_assign_to_sub_assign:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:9:17:9:19=2 (80ms)
cxx_init_const:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:7:7:7:18=1 (78ms)
cxx_eq_to_ne:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:8:15:8:17=1 (81ms)
cxx_add_assign_to_sub_assign:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:9:17:9:19=1 (84ms)
And we can see the wasted duplication of squashed mutations with:
$ ../../../dump_mull_sqlite_db.sh ../core.sqlite | grep "=1" | sort
cxx_add_assign_to_sub_assign:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:9:17:9:19=1 (84ms)
cxx_eq_to_ne:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:8:15:8:17=1 (81ms)
cxx_eq_to_ne:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:8:15:8:17=1 (92ms)
cxx_init_const:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:7:7:7:18=1 (78ms)
cxx_init_const:/mounted_from_host/numerical_coverage_examples/getNumDaysInMonth/getNumDaysInMonth.cpp:7:7:7:18=1 (95ms)
For small tests and small test suites like this, running duplicate mutations is not a big deal. But for larger projects like Trilinos (as described above), this would be likely be a show-stopper.
I will note, however, for the first use case mentioned above were you are trying to iteratively debug and squash surviving mutations with lots of build/test cycles, you would likely limit the mutations that mull produces at build time as per:
- #1132
But in cases where most of your work is updating the tests to add more checks and improve the checks were you are not having rebuild the code in the upstream object files, having mull-runner skip running mutations that have already been squashed could be a big time saver.
Likely the most efficient way to workaround this issue would be to edit the coverage information that is passed in through mull-runner --coverage-info=<test-name>.profdata .... What you would do is to copy the <test-name>.profdata file to <test-name>-surviving-mutants.profdata and then modify the latter to only leave coverage regions that have unsquashed mutations. But that would still run squashed mutations in those same regions.
To fully avoid all squashed mutations, you could write a filter program that we pass to mull-runner through the --test-program argument where we get the mutation env var set by mull and check it against the DB of all mutations for that executable that have already been squashed. (You can produce that list of all mutations for that executable without actually running the executable with a helper script that I wrote.) That would be easy to implement but might be a bit slow if there was a lot of mutations. (To make this fast, you would need to start as process that loaded up the list of mutations and then did a fast search each time mull-runner called the filter script.) Then the filter script would return 0 if the mutation was already squashed and otherwise would run the underlying test executable.