pytest-xdist
pytest-xdist copied to clipboard
Ability to execute certain tests sequentially while using -n (xdist)
Hi,
I am looking for if there is a way to allow certain group of tests to run sequentially only while other tests continue to run in parallel using -n (xdist).
I have a group of tests that cannot be executed in parallel (small set of tests only in the whole). I do not want to create another job to run without -n for these small set. I searched but did not find actual solution anywhere.
The versions I am using are: pytest-xdist 1.20.1 pytest 3.2.5 Python 2.7.13
Thanks in advance.
A @pytest.mark
would great for that purpose!
currently there isn't enough metadata being transferred between the worker and the scheduler to facilitate this
Indeed we don't have a solution for that currently.
@cauvery what we do at work is to mark some tests with a known mark (@pytest.mark.serial
) and then execute pytest twice: once with xdist excluding the mark, and then without xdist but including the mark:
$ pytest -n auto -m "not serial"
...
$ pytest -m "serial"
This way the serial-marked tests will execute sequentially in their own process. Just leaving this here in case you did not find that solution elsewhere. 👍
Thank you @nicoddemus, seen this solution, but here I need to create two separate jobs in Jenkins. One for parallel execution and one for sequential execution which I don't prefer in my project (having another job just for small subset).
Just FYI I can specify the two command lines in single job but the reports and result will be for the last command only.
-Cauvery.
Does anyone else have any alternate solution or workaround for this.
@cauvery
what is the kind of scheduling in xdist you have opted for??
Are the testcases which you want to execute sequentially spread across multiple different python test modules/packages??
I have a similar need to @cauvery
In my case I have some integration tests (through @pytest.mark.parametrize) which are making modifications to a shared object and a fixture which always sets the state of that shared object to a known, initial state.
Unfortunately when these test cases end up running in a different thread, then they can "step on each other's toes".
It would be nice to have a marker in xdist, which would allow us to force some tests to run under the same thread, alias sequentially.
@akr8986 to answer question
Q: what is the kind of scheduling in xdist you have opted for??
A: I am using -n 3 in the command line. Does that answer the question/did I get the question wrong.
Q: Are the testcases which you want to execute sequentially spread across multiple different python test modules/packages?? A: Yes, they are spread across modules.
Thanks, Cauvery.
@cauvery.. xdist supports 4 scheduling algorithms.. "each", "load", "loadscope", "loadfile".. now -n is a shortcut for load scheduling.. meaning the scheduler will load balance the tests across the workers..
If you take the case of loadfile scheduling, all testcases in a testfile will be executed sequentially by the same worker...
now from your requirement you will need something similar to loadFile but the scheduling will not be based on file but on some marker... i would suggest write a new scheduler all together.. Mark all the tests with a group name and distribute the tests having the same marker to a single workers and others to the rest of the workers..
Now the scheduler itself is not a plugin that can be installed in an environment.. perhaps @nicoddemus can comment on that..
Over on my team we are trying to figure out if there is a way we could use pytest-xdist but we have similar blockers, and a home-grown tool was born of of those called https://github.com/ansible/pytest-mp . It does some nice things, such as allowing us to group tests that may be run together but not with other tests, or tests that need to be run isolated and in serial, but does not do other things as well such as testing multiple targets at once or spreading the test execution across multiple nodes.
It would be nice if we could benefit from the larger community that python-xdist enjoys, will think if there is a way we can get some of this functionality we rely on into xdist.
:thinking:
If you're still considering this feature request, my company is looking for something where we can run some tests in parallel, but others sequentially due to resource sharing. This would be a very useful feature for pytest-xdist.
Isn't this enough?
py.test -m sequential
py.test -m "non sequential" -n8
@atzorvas - I hoped it would be, but when I tried this, I ran into two severe problems with pytest-xdist, or how pytest works:
- Fixtures do not show log lines unless there's a failure, and they are set to warning: https://github.com/pytest-dev/pytest-xdist/issues/402
- Session scoped fixtures are run for every process: https://github.com/pytest-dev/pytest-xdist/issues/271
The latter is causing me a lot of problems, because it's creating collisions when I attempt to run tests on an AWS resource. If I have five processes, and they all configure the lambda I'm using at the same time, it fails. I have a session fixture to do this once, at the start of the test session. It must not be run more than once.
I'm pretty much stuck for finding a way to run tests in parallel. I'm likely going to have to execute jenkins runs in parallel, each with a subset of the tests. Not ideal. If anybody has a solution, I would love to investigate it.
for the session fixture / run once problem, you should write a lock file and have one process do the run-once type code after it creates the lock file, have other processes polling for the lock file to be deleted before they carry on. We do exactly this
Thanks @symonk . That sounds like a pretty decent solution, but probably won't work for my problem. Since I am updating AWS Lambda environmental variables as a session fixture, each process would execute the same fixture, while other processes are finished, causing intermittent issues.
The way I've designed my tests doesn't seem compatible with multi-processor testing.
That sounds like a pretty decent solution, but probably won't work for my problem. Since I am updating AWS Lambda environmental variables as a session fixture, each process would execute the same fixture, while other processes are finished, causing intermittent issues.
Perhaps you can write the environment variables to the file protected by the lock, something like (untested):
from filelock import FileLock
@pytest.fixture(scope="session")
def my_session_fix(tmp_path_factory):
if not worker_id:
# not executing in multiple workers
env_vars = create_env_vars()
else:
# get the temp directory shared for by all workers
root_tmp_dir = tmp_path_factory.getbasetemp().parent
f = root_tmp_dir / 'envs.json'
with FileLock(str(f) + '.lock'):
if not f.is_file():
env_vars = create_env_vars()
f.write_text(json.dumps(env_vars))
else:
env_vars = json.loads(f.read_text())
os.environ.update(env_vars)
(This uses filelock)
Only one process each time will be able to get to the envs.json
file: the first process creates the file with the environment variables encoded as a JSON file, the next processes will only read from the file.
(EDIT: my initial take did not really work, have fixed it now using the same approach as in https://github.com/pytest-dev/pytest-xdist/pull/483).
EDIT: there's a documented version in the docs now: https://pytest-xdist.readthedocs.io/en/latest/how-to.html#making-session-scoped-fixtures-execute-only-once
I've opened https://github.com/pytest-dev/pytest-xdist/pull/483 adding an example to the README, would appreciate reviews. 👍
Thanks @nicoddemus. After looking at your code, and reviewing @symonk 's last post again, this could be a viable solution for me. I probably only have to put this file lock wrapper around the session, and module fixtures, probably with just a decorator.
Cool (I've update the example above again after realizing it was mixing code from #483)
I also need this feature eagerly! The problem is that if I run pytest twice, I get two results.
$ pytest -n auto -m "not serial"
$ pytest -m "serial"
This could be very cool indeed. I run some functional tests against a running system. Some of the tests impact very hardly on the system, so I'd need to isolatedly run those tests. A marker could be perfect :)
@pytest.fixture(autouse=True)
def be_sequential(request):
if request.node.get_closest_marker("sequential"):
with FileLock("semaphore.lock"):
yield
else:
yield
I'm using something like this. If you mark test as sequential they will be blocking between workers. Still it would be nice to send all sequential tests to one worker to have the shortest time of test execution.
:sparkles: This is an old work account. Please reference @brandonchinn178 for all future communication :sparkles:
Just wrote this patch; I think it works. It would be nice to have this provided out of the box. (It doesn't actually solve my particular use-case, as I need these serial tests to run without any other tests running in parallel, but it solves the problem in this ticket)
from xdist import is_xdist_controller
from xdist.scheduling import LoadScopeScheduling
def pytest_configure(config):
config.pluginmanager.register(XDistSerialPlugin())
class XDistSerialPlugin:
def __init__(self):
self._nodes = None
@pytest.hookimpl(tryfirst=True)
def pytest_collection(self, session):
if is_xdist_controller(session):
self._nodes = {
item.nodeid: item
for item in session.perform_collect(None)
}
return True
def pytest_xdist_make_scheduler(self, config, log):
return SerialScheduling(config, log, nodes=self._nodes)
class SerialScheduling(LoadScopeScheduling):
def __init__(self, config, log, *, nodes):
super().__init__(config, log)
self._nodes = nodes
def _split_scope(self, nodeid):
node = self._nodes[nodeid]
if node.get_closest_marker("serial"):
# put all `@pytest.mark.serial` tests in same scope, to
# ensure they're all run in the same worker
return "__serial__"
# otherwise, each test is in its own scope
return nodeid
I need these serial tests to run without any other tests running in parallel
We have the same requirement at work, but we solved it differently:
-
We mark the tests that need to run serially with a
@pytest.mark.serial
mark. -
Execute pytest in parallel, excluding the tests with the mark:
$ pytest -n auto -m "not serial"
-
Execute pytest again serially, selecting only the marked tests:
$ pytest -m "serial"
Just to complement that there's this alternative.
EDIT: just to not too that you can execute the two commands in the same job, you don't need separate jobs.
Just wrote this patch; I think it works. It would be nice to have this provided out of the box. (It doesn't actually solve my particular use-case, as I need these serial tests to run without any other tests running in parallel, but it solves the problem in this ticket)
from xdist import is_xdist_controller from xdist.scheduling import LoadScopeScheduling def pytest_configure(config): config.pluginmanager.register(XDistSerialPlugin()) class XDistSerialPlugin: def __init__(self): self._nodes = None @pytest.hookimpl(tryfirst=True) def pytest_collection(self, session): if is_xdist_controller(session): self._nodes = { item.nodeid: item for item in session.perform_collect(None) } return True def pytest_xdist_make_scheduler(self, config, log): return SerialScheduling(config, log, nodes=self._nodes) class SerialScheduling(LoadScopeScheduling): def __init__(self, config, log, *, nodes): super().__init__(config, log) self._nodes = nodes def _split_scope(self, nodeid): node = self._nodes[nodeid] if node.get_closest_marker("serial"): # put all `@pytest.mark.serial` tests in same scope, to # ensure they're all run in the same worker return "__serial__" # otherwise, each test is in its own scope return nodeid
@brandon-leapyear, thank you very much for this patch, it helped me a lot.
The only thing is that the imports have changed a little
from xdist.scheduling
-> from xdist.scheduler
Used pytest==7.1.2
and Python 3.7.8
Just wrote this patch; I think it works. It would be nice to have this provided out of the box. (It doesn't actually solve my particular use-case, as I need these serial tests to run without any other tests running in parallel, but it solves the problem in this ticket)
from xdist import is_xdist_controller from xdist.scheduling import LoadScopeScheduling def pytest_configure(config): config.pluginmanager.register(XDistSerialPlugin()) class XDistSerialPlugin: def __init__(self): self._nodes = None @pytest.hookimpl(tryfirst=True) def pytest_collection(self, session): if is_xdist_controller(session): self._nodes = { item.nodeid: item for item in session.perform_collect(None) } return True def pytest_xdist_make_scheduler(self, config, log): return SerialScheduling(config, log, nodes=self._nodes) class SerialScheduling(LoadScopeScheduling): def __init__(self, config, log, *, nodes): super().__init__(config, log) self._nodes = nodes def _split_scope(self, nodeid): node = self._nodes[nodeid] if node.get_closest_marker("serial"): # put all `@pytest.mark.serial` tests in same scope, to # ensure they're all run in the same worker return "__serial__" # otherwise, each test is in its own scope return nodeid
@brandon-leapyear, thank you very much for this patch, it helped me a lot. The only thing is that the imports have changed a little
from xdist.scheduling
->from xdist.scheduler
Used
pytest==7.1.2
andPython 3.7.8
@vamotest Sorry but how do you apply the patch? I tried doing it by running the patch command in xdist folder but doesn't seem to work?
@ngyingkai
It is necessary to put the above piece of code in conftest.py
And mark the tests themselves
@pytest.mark.serial
def test_my_awesome_serial():
pass
def my_parallel_test():
pass
And run
PYTHONHASHSEED=0 python3 -m pytest -n 4 --alluredir=allure_data
Then all serial tests will be run on one worker, and the rest in parallel on others.
I'm fine with integration tests, but for unit with the same scheme I get an error Unexpectedly no active workers available
It seems from the point of view of the test run, everything is fine launch, but in job'e there is a drop from worker_internal_error
From what I managed to unwind in traceback (full traceback)
tests_1 | INTERNALERROR> E return self._hookexec(self, self.get_hookimpls(), kwargs)
tests_1 | INTERNALERROR> E File "/usr/local/airflow/.local/lib/python3.7/site-packages/pluggy/manager.py", line 93, in _hookexec
tests_1 | INTERNALERROR> E return self._inner_hookexec(hook, methods, kwargs)
tests_1 | INTERNALERROR> E File "/usr/local/airflow/.local/lib/python3.7/site-packages/pluggy/manager.py", line 87, in <lambda>
tests_1 | INTERNALERROR> E firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
tests_1 | INERNALERROR> E File "/usr/local/airflow/.local/lib/python3.7/site-packages/pluggy/callers.py", line 208, in _multicall
tests_1 | INTERNALERROR> E return outcome.get_result()
...
tests_1 | =========== 3459 passed, 85 skipped, 7 warnings in 310.54s (0:05:10) ===========
tests_1 | RESULT_CODE=1
Perhaps it rests on the firstresult of the hook and loop_once
I found a similar one issue,but it seems to have been decided in anotherissue back in 2019. At the same time, I have the latest version of the pytest
/pytest-xdist
Perhaps you can help, please? @brandon-leapyear brandon-leapyear or @nicoddemus nicoddemus I don't want to run serial tests first, and then parallel ones. Or create a separate stage for each of them
Badly need this too!
Use --dist=loadgroup
(introduced in 2.5.0).
This allows you to run tests in parallel by default, and run specifically marked tests serially.
From https://pytest-xdist.readthedocs.io/en/latest/distribution.html:
[...] guarantees that all tests with same xdist_group name run in the same worker. Tests without the xdist_group mark are distributed normally as in the --dist=load mode.
The example below runs test_banana
and test_apple
in the same worker. The other tests are run as usual, i.e. they are distributed across workers.
import pytest
@pytest.mark.xdist_group(name="fruit")
def test_banana():
print('banana')
@pytest.mark.xdist_group(name="fruit")
def test_apple():
print('apple')
def test_broccoli():
print('broccoli')
def test_carrot():
print('carrot')
def test_mushroom():
print('mushroom')
def test_fungus():
print('fungus')
@WilliamDEdwards
Use --dist=loadgroup (introduced in 2.5.0). This allows you to run tests in parallel by default, and run specifically marked tests serially.
But how this should control the order of execution? As I can see this only controls the place/worker of execution?