pytest-randomly Add support for minimization

Once you have tests that fail only when run in a particular order, it'd be incredibly useful to have sort of support for automatically determining what the minimum set of tests are required to trigger the failure.

Jun 02 '19 22:06 alex

Yes, this is a bit related to #17. Perhaps the set of tests could be cached in .pytest_cache somehow, and then this could be bisected on. The problem with bisection is that in the worst test isolation failures, cutting the list of tests in the middle fixes them, since it's a test in the first half leaking state into the second half.

Fixing #17 would at least allow manual bisection

Jun 05 '19 07:06 adamchainz

I remember an approach in the context of bisecting for multiple errors: In the bisection step, you can reorganize the partitions to find a suitable split.

For example, assume we have a (full) list of tests F. A test f_x fails if f_p is executed before, f_x and f_p in F. Therefore, we want to identify a subset that contains f_x and f_p. We start with a bisection and create 2 lists with half of the elements: H1, H2 where H1 + H2 = F Now, there are 4 cases (assuming F fails):

H1 fails, H2 fails => this most likely means we have more issues. We can just continue with H1 and ignore H2 (can be investigated later)
H1 fails, H2 is ok => continue with H1
H1 is ok, H2 fails => continue with H2
H1 is ok, H2 is ok => continue with a new partition

So, both H1 and H2 are ok and f_x is in one list and f_p in the other list. Therefore, we create new partitions. We split up H1 and H2 in half (and get quarters): Q1, Q2, Q3, Q4 where Q1 + Q2 = H1 and Q3 + Q4 = H2

We create all possible pairs of these 4 lists and use these pairs as input to the previous step:

(Q1+Q3, Q2+Q4)
(Q1+Q4, Q2+Q3)
We can skip (Q1+Q2, Q3+Q4), because we tested it already.

Now, there must be at least one partition where f_x and f_p are both in one half of the selected tests. Continue with the typical approach until we cannot reduce anymore.

It could happen that all these pairs are ok. Then, we would have at least one addition test f_p2 that is required to be executed before f_x. We could either further split up our partitions and rearrange them or at some point we might consider switching to a less efficient algorithm and take out subsets of certain size from F. Worst case: all tests in F before the last test f_l test must be executed to make f_l fail. We can verify this worst case only by testing all subsets that remove for each subset a single different test from F.

May 10 '20 17:05 thbde

That sounds like a great process @thbde . I think it's a challenge to implement since the test process would need restarting repeatedly for each arrangement of the partitions. We'd need to preserve a bunch of state in .pytest_cache and encourage the user to keep reinvoking pytest- --randomly-minimize-failure (or whatever the flag is), with extra output to describe where we're at. Or perhaps pytest could reinvoke itself at the end of the run, though that sounds a bit hacky.

May 11 '20 15:05 adamchainz