pytest-xdist icon indicating copy to clipboard operation
pytest-xdist copied to clipboard

Upgrading from 2.3.0 to 3.3.1 and use --dist=worksteal reorders the test cases

Open cwiede opened this issue 2 years ago • 11 comments

Hi,

we have a long running test suite which consists of a single test function run on different input files (the input files are passed via @pytest.mark.parametrize). The function runs longer for larger input files, therefore we are sorting the input files according to their sizes such that the largest file comes first. This has been working well with the admittedly old version with the default distribution scheduling.

Now I have read about the worksteal option and wanted to give it a try. Unfortunately, I'm realizing now that the order seems to be randomized. Because the duration of the tests might differ by some magnitudes, it is very important for a short overall duration to execute the long running tests first. Is there a way to do that with the current version?

cwiede avatar Jun 29 '23 12:06 cwiede

Hi @cwiede,

Not currently, the worksteal mode will see which workers are idle and then get work from workers which are busy, but there is no priority mechanism for that.

nicoddemus avatar Jun 29 '23 22:06 nicoddemus

Hi @nicoddemus

thanks for the answer, this is a bit of a pity for our use case. Please allow me one more question - is the reordering of test cases only happening in worksteal mode or is it part of a bigger change (I read something about fixture evaluation reduction in the Changelog)? In case of the latter, I suppose that our use case is not supported at all anymore (despite writing a custom scheduler).

Thank you!

EDIT: I just tested with 3.3.1 and it seems that almost all available schedulers are re-ordering parametrized test cases now, the only exception was --dist loadgroup but I guess that this was more by accident than as a feature which we can rely upon.

cwiede avatar Jun 30 '23 11:06 cwiede

I just tested with 3.3.1 and it seems that almost all available schedulers are re-ordering parametrized test cases now

Could you provide an example of what you're talking about? I. e. a list of test ids, the order you expect, and what actually happens

amezin avatar Jul 05 '23 06:07 amezin

This is a simple test (the real number of test cases is a lot larger and also the spread between long and short tests is much more, we are talking of an hour vs 1 minute). However, this test case demonstrates the use case well enough.

import time
import pytest

@pytest.mark.parametrize("duration", [10.0, 9.0, 8.0, 7.0, 6.0, 1.7, 1.6, 1.5, 1.4, 1.3, 1.2, 1.1, 1.0, 0.9, 0.8])
def test_dummy(duration):
    time.sleep(duration)

I'd like to have the long-runners executed in the beginning, otherwise the core utilization is low in the end resulting in a long overall duration. When I execute this with pythorksteal, pytest-xdist==3.3.1, I get the following order:

pytest -n 4 --dist worksteal ..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py -v
# ...
scheduling tests via WorkStealingScheduling

..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[10.0]
..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[1.1]
..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[7.0]
..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[1.5]
[gw3] [  6%] PASSED ..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[1.1]
..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[1.0]
[gw2] [ 13%] PASSED ..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[1.5]
..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[1.4]
[gw3] [ 20%] PASSED ..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[1.0]
..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[0.9]
[gw2] [ 26%] PASSED ..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[1.4]
..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[1.3]
[gw3] [ 33%] PASSED ..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[0.9]
..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[0.8]
[gw3] [ 40%] PASSED ..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[0.8]
..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[1.7]
[gw2] [ 46%] PASSED ..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[1.3]
..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[1.2]
[gw2] [ 53%] PASSED ..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[1.2]
..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[8.0]
[gw3] [ 60%] PASSED ..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[1.7]
..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[1.6]
[gw1] [ 66%] PASSED ..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[7.0]
..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[6.0]
[gw3] [ 73%] PASSED ..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[1.6]
[gw0] [ 80%] PASSED ..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[10.0]
..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[9.0]
[gw1] [ 86%] PASSED ..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[6.0]
[gw2] [ 93%] PASSED ..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[8.0]
[gw0] [100%] PASSED ..\core\app-specific\python-ifm_o3r_ods\ifm_o3r_ods-pytest\test_dummy.py::test_dummy[9.0]

as you can see the test with 9s duration is scheduled last, which is kind of contra-productive (and similar things are also happening in the real-world use case). When I run this with pytest-xdist==2.3.0 and the default scheduling, the tests are ordered with descending duration (as given in pytest.mark.parametrize).

cwiede avatar Jul 05 '23 07:07 cwiede

as you can see the test with 9s duration is scheduled last

It's just the 2nd test on gw0. No tests with smaller duration were executed on gw0 ahead of it - so there's no reordering.

Both worksteal and load initially schedule in chunks of more than 1 test, so both tests with 10.0 and 9.0 duration are sent to gw0.

worksteal could fix things like this, in theory.. but it can't move the next test (the next one to the currently executing one) to a different worker. You could actually get better results if you random.shuffle() the parameter values.

Old pytest-xdist versions did better because load was round-robin scheduling the initial batch. Which was significantly worse for fixture reuse (imagine you have only two tests using a slow to setup fixture in the initial batch - old xdist would setup it twice).

amezin avatar Jul 05 '23 08:07 amezin

Both worksteal and load initially schedule in chunks of more than 1 test, so both tests with 10.0 and 9.0 duration are sent to gw0.

I'm not sure if this is feasible, but probably my use case would be solved by changing the initial assignment of tests to workers from w0 = [t0, t1], w1 = [t2,t3], ... to w0 = [t0, t4], w1 = [t1, t5], ... assuming an initial scheduling size of 2 and 4 workers.

worksteal could fix things like this, in theory.. but it can't move the next test (the next one to the currently executing one) to a different worker. You could actually get better results if you random.shuffle() the parameter values.

Note that the sorting of tests is done intentionally since this avoided having slow tests executed at the end. A random.shuffle might help, but it's less optimal than sorting.

Old pytest-xdist versions did better because load was round-robin scheduling the initial batch. Which was significantly worse for fixture reuse (imagine you have only two tests using a slow to setup fixture in the initial batch - old xdist would setup it twice).

Is it planned to have the old round-robin behaviour available as a scheduler, something like --dist=roundrobin?

cwiede avatar Jul 05 '23 09:07 cwiede

Is it planned to have the old round-robin behaviour available as a scheduler, something like --dist=roundrobin?

I do not plan it, because it's only useful for some edge cases, and it worked this way only for the first n/4 tests, then switched to consecutive chunks. If your long tests were in the second half of the test suite, they would still have a high chance of being scheduled to the same worker with the old load scheduler.

The closest you could get to round-robin scheduling is --dist load + --maxschedchunk 1. However, it still won't affect first two tests for every worker.

If only pytest_runtest_protocol accepted, for example, a callable as nextitem... (because it isn't actually necessary until teardown)

@nicoddemus @RonnyPfannschmidt What do you think, is it possible to change pytest not to require nextitem until teardown? Because currently all edge cases where worksteal performs bad are caused by this limitation.

amezin avatar Jul 05 '23 10:07 amezin

I wish nextitem was easy to anhillate, it was introduced as a cludge for when scope selecting reordering was introduced to enable teardown

We May be able to replace it with a wrapper that allows to defer the decision,but this may break various downstreams

Personally I'd like to get rid of the runtestprotocol Hook all together

RonnyPfannschmidt avatar Jul 05 '23 11:07 RonnyPfannschmidt

Are there any relevant discussions/issues in pytest core repo?

amezin avatar Jul 06 '23 17:07 amezin

No discussion, only some placeholder issue to eventually remove runtestprotocol

RonnyPfannschmidt avatar Jul 06 '23 17:07 RonnyPfannschmidt

We May be able to replace it with a wrapper that allows to defer the decision,but this may break various downstreams

Maybe https://github.com/ionelmc/python-lazy-object-proxy ?

amezin avatar Jul 08 '23 18:07 amezin