pytest-xdist
pytest-xdist copied to clipboard
Support group/scope scheduling in `worksteal`
In the documentation it says that if a test is not in an xdist_group, that it defaults to the load behaviour.
Would it be possible to choose the default behaviour so it could be something like worksteal?
In the documentation it says that if a test is not in an
xdist_group, that it defaults to theloadbehaviour.
I think the documentation isn't accurate, xdist (as far as I understand the code) doesn't switch schedulers, it just happens that loadgroup with every test being its own small "group" is very similar to load (but not exactly the same - LoadScheduling and LoadScopeScheduling do not share any code).
Would it be possible to choose the default behaviour so it could be something like
worksteal?
Only if someone implemented the concept of "groups" in worksteal scheduler. I do not need this feature myself so unlikely to work on it.
BTW, why do you use loadgroup instead of simple load?
We have a test suite where some tests are regular Python tests and others test a Python API with a locally hosted server. So we want everything parallelised but the API tests to run sequentially, which they will if they are on the same worker.
In our case we have huge dataset created for module (like fixtures). Therefore it is much more fast to use loadgroup as the tests from the same class/module running on the same worker and initial setup need to be done only once.
Very often we see that some workers finish the job earlier and another worker is hard working to finish it's chunk... it would be great to have a stealing approach when the worker is idle in such case...
@sshishov Is current worksteal not good enough in your case? Because it already schedules tests in large contiguous chunks (at least initially), so tests from same module should get sent to the same worker.
We have a similar case to @HMellor and @sshishov where we have expensive setup that we wanted shared across a set of tests within a group. However, once setup is complete these tests run very quickly. This results in workers being loaded up with a lot of tests from the loadgroup scheduling, but often then are idle towards the end as other long running tests (that are not within the same group) occupy the other workers.
We end up with scheduling that looks like:
gw0 - test_one@somegroup... (8 tests, 20minutes runtime) gw1 - test_two@someothergroup... (2 tests, 15minutes runtime) gw2 - test_three, test_four, test_five (20m runtime) gw3 - test_really_long, test_six, test_seven,(45m runtime)
Another option I had was to make --maxschedchunk option work with the loadgroup.
Hi @amezin , I did not know that worksteal is scheduling the tests from same module to the same worker... From the documentation I can infer that it is working the same as load, meaning completely random order per test. Am I missing something, or the info provided by you is just omitted from the documentation?
worksteal doesn't do anything specific to schedule tests from the same module to the same worker. However, it should be a lot less "randomized" than load. Initially, worksteal takes first n_tests/n_workers tests, and sends them to the 1st worker. Then 2nd group of the same size to the 2nd worker, and so on. So unless you reorder tests intentionally or a lot of rebalancing is required (which worksteal tries to avoid - it starts moving tests between workers only when some worker completely runs out of work), tests from the same module will likely be ran by the same worker. At least, most of them.
@amezin the question is, if we are using pytest-randomly which randomize the seed as well as the order of tests, will it affect the scheduling?
Unfortunately, yes. That's what I meant by "unless you reorder tests intentionally"...
Although, if I'm not mistaken, there was a random reorder plugin that was able to reorder tests inside of scopes, without breaking the scopes themselves. I'm not sure whether it was pytest-randomly or some other plugin.
We tested it out and found out that if we are using loadgroup scheduling and pytest-randomly it will work as expected, meaning the tests will be scheduled from the same module to the same worker... just the tests will be scheduled on random order (if it make sense)... What we wanted - to add the "stealing" functionality to this scheduling, that if the worker finished its work, it could "steal" the work from another worker...
But to be honest, I should look more deeper inside what how everything is working inside pytest-randomly