pytest-xdist icon indicating copy to clipboard operation
pytest-xdist copied to clipboard

loadscope incorrectly modifying test ordering, when ordering is relevant

Open Toad2186 opened this issue 1 year ago • 15 comments

I'm not sure if there are other use cases, but the relevant change I'm talking about is a byproduct of #778.

With #778, we are now ordering tests based on the number of tests available. The problem is that this changes the input test order, and in some cases, such as when running with pytest-django, the test ordering is relevant. In particular, Django (and also pytest-django) orders its tests by grouping them:

(django) TestCase
TransactionalTestCase, SimpleTestCase
All other tests

Tests within each group doesn't need to be ordered, but the group ordering is relevant because TransactionalTestCase may have undesirable side-effects.

For simplicity, assume we run with one worker with --dist loadscope.

So if I have tests like this, for example:

class TestTransactionalTestCase1(TransactionalTestCase):
    def test_tttc_1...

    def test_tttc_2...

class TestCase1(TestCase):
    def test_tc_3...

When running with 1 worker, we would get this ordering:

test_tttc_1
test_tttc_2
test_tc_3

Whereas when we run without pytest-xdist, we would run in this order (and this is the "correct" behavior based on Django's ordering):

test_tc_3
test_tttc_1
test_tttc_2

More broadly, I think re-ordering the input test ordering may conflict with other scenarios where ordering does matter. It's not a Django-specific issue, although I don't know of any other examples where this is relevant.

Toad2186 avatar May 14 '24 13:05 Toad2186

I guess a solution would be to include a new ini-option that disables this ordering for loadscope?

cc @cryvate @joekohlsdorf

nicoddemus avatar May 14 '24 15:05 nicoddemus

@nicoddemus I think that makes sense.

Actually thinking about it, I had misunderstood the change I made as I was not too aware of the internals of pytest-xdist.

I think it makes sense to do the following 2 things:

  • make the behaviour opt in (or opt out) using a new ini option and command line option
  • make the behaviour also supported in loadfile, load (I did not realize there was such a split).

This can be achieved in two PRs I reckon.

cryvate avatar May 14 '24 15:05 cryvate

The change which was done to loadscope in #778 does not reorder individual tests within a scope, it orders entire scopes by the number of tests within them. With xdist you can't expect deterministic ordering of scopes because it will distribute scopes arbitrarily among workers.

On a sidenote, I disagree that TransactionalTestCase has side effects, the database is still rolled back after each test. Other types of side effects are usually issues with your tests and I don't recommend trying to avoid flakiness by enforcing a test order.

joekohlsdorf avatar May 14 '24 15:05 joekohlsdorf

With xdist you can't expect deterministic ordering of scopes because it will distribute scopes arbitrarily among workers.

Good point, this invariant was not changed by #778, being there previously already.

nicoddemus avatar May 14 '24 15:05 nicoddemus

the database is still rolled back after each test.

The database is not rolled back after each test. The database is truncated after each test, which is a little different. One retains the state of the DB from the beginning of the test (with some exceptions, such as sequences which retain their incremented counts in most DB types); the other wipes the DB entirely.

It's not impossible to make TransactionTestCase work irrespective of the ordering of TestCase vs. TransactionTestCase, but it might be much less efficient. For example, if you preload a large amount of data into the DB, running a transaction test case would wipe the data, then, test case would fail. You can argue it's more correct that each test should preload its own data, and you wouldn't be entirely wrong and we can do that, but loading and deleting a bunch of data before and after each test can end up slowing down things, a lot, depending on the operations involved.

With xdist you can't expect deterministic ordering of scopes because it will distribute scopes arbitrarily among workers.

Good point, this invariant was not changed by https://github.com/pytest-dev/pytest-xdist/pull/778, being there previously already.

The issue isn't deterministic ordering -- the issue is relative ordering of groups of tests. Again, it's not entirely wrong per-se to require that tests run irrespective of ordering, but this grouping is something that's been in Django since 2.0 and you can look at that commit for the same reasoning I mentioned above.

The ordering between groups, while not explicitly supported, was the case prior to #778 and was changed by #778. Which also means that, since then, unfortunately it's not 100% compatible with Django tests in all cases for test suites that take advantage of Django's grouping guarantee. And I get that not everything revolves around Django, so perhaps it's not a problem that you'd even care to solve, but IMO it makes sense to try to play as nice as possible with other libraries.

Toad2186 avatar May 14 '24 16:05 Toad2186

The ordering between groups

IIUC, previously due to the implementation it would always guarantee the order of the groups, so tests for group A and group B would for sure end up either in the same work (in that order), or in separate workers depending on number of workers/tests. But #778 indeed changed that so group B and group A might execute in that order on the same worker, in case group B has more tests (please correct me if I'm wrong, I'm drawing from memory as I'm short on time to look up at the code/issues with more detail).

nicoddemus avatar May 14 '24 17:05 nicoddemus

IIUC, previously due to the implementation it would always guarantee the order of the groups, so tests for group A and group B would for sure end up either in the same work (in that order), or in separate workers depending on number of workers/tests. But https://github.com/pytest-dev/pytest-xdist/pull/778 indeed changed that so group B and group A might execute in that order on the same worker, in case group B has more tests (please correct me if I'm wrong, I'm drawing from memory as I'm short on time to look up at the code/issues with more detail).

Exactly. pytest-xdist does not guarantee group ordering either, so officially it's still correct, but different from what used to happen.

Toad2186 avatar May 14 '24 17:05 Toad2186

In that case, I think it is reasonable to add an option to make the behavior opt-out in case group order matters.

Why not the other way around? My reasoning is that we should always advise for tests not to be order-dependent, so guaranteeing group order by default goes against that. In addition, the performance benefits of ordering the groups by number of tests is significant so I think should be the default.

What do you folks think?

I would love to review a PR adding that option and accompanying tests. :+1:

nicoddemus avatar May 14 '24 17:05 nicoddemus

I can take a stab at it.

Toad2186 avatar May 15 '24 02:05 Toad2186

Here you go. https://github.com/pytest-dev/pytest-xdist/pull/1098

Toad2186 avatar Jun 12 '24 23:06 Toad2186

@nicoddemus Can you please take a look at #1098?

Toad2186 avatar Jul 25 '24 18:07 Toad2186

would it be feasible to give the user an option to load tests in a pre-determined order? Perhaps a non-exhaustive list of tests provided in configuration that define tests that should be run before the others (perhaps tests that are slow, or critical to passing, etc.)

zacharyburnett avatar Sep 20 '24 17:09 zacharyburnett

Can I get a review on my PR please?

Toad2186 avatar Oct 15 '24 16:10 Toad2186

Looking forward to having this merged. The assumption that the scope with the most tests is the "slow" one so should be scheduled first is, for our use case, false. pytest-order is used to put the long tests into the queue first thereby reducing the overall test execution time as the feature intended.

Hyzerputt avatar Oct 15 '24 16:10 Hyzerputt

Looking forward to having this merged. The assumption that the scope with the most tests is the "slow" one so should be scheduled first is, for our use case, false. pytest-order is used to put the long tests into the queue first thereby reducing the overall test execution time as the feature intended.

me too, @nicoddemus it would be amazing to get this PR merged and a new version of pytest-xdist released

albertino87 avatar Oct 15 '24 16:10 albertino87