pytest-xdist icon indicating copy to clipboard operation
pytest-xdist copied to clipboard

Ability to execute certain tests sequentially while using -n (xdist)

Open cauvery opened this issue 6 years ago • 33 comments

Hi,

I am looking for if there is a way to allow certain group of tests to run sequentially only while other tests continue to run in parallel using -n (xdist).

I have a group of tests that cannot be executed in parallel (small set of tests only in the whole). I do not want to create another job to run without -n for these small set. I searched but did not find actual solution anywhere.

The versions I am using are: pytest-xdist 1.20.1 pytest 3.2.5 Python 2.7.13

Thanks in advance.

cauvery avatar Dec 05 '18 07:12 cauvery

A @pytest.mark would great for that purpose!

Horstage avatar Dec 05 '18 14:12 Horstage

currently there isn't enough metadata being transferred between the worker and the scheduler to facilitate this

RonnyPfannschmidt avatar Dec 05 '18 15:12 RonnyPfannschmidt

Indeed we don't have a solution for that currently.

@cauvery what we do at work is to mark some tests with a known mark (@pytest.mark.serial) and then execute pytest twice: once with xdist excluding the mark, and then without xdist but including the mark:

$ pytest -n auto -m "not serial"
...
$ pytest -m "serial"

This way the serial-marked tests will execute sequentially in their own process. Just leaving this here in case you did not find that solution elsewhere. 👍

nicoddemus avatar Dec 05 '18 16:12 nicoddemus

Thank you @nicoddemus, seen this solution, but here I need to create two separate jobs in Jenkins. One for parallel execution and one for sequential execution which I don't prefer in my project (having another job just for small subset).

Just FYI I can specify the two command lines in single job but the reports and result will be for the last command only.

-Cauvery.

cauvery avatar Dec 05 '18 16:12 cauvery

Does anyone else have any alternate solution or workaround for this.

cauvery avatar Dec 05 '18 16:12 cauvery

@cauvery

what is the kind of scheduling in xdist you have opted for??

Are the testcases which you want to execute sequentially spread across multiple different python test modules/packages??

akr8986 avatar Dec 07 '18 07:12 akr8986

I have a similar need to @cauvery

In my case I have some integration tests (through @pytest.mark.parametrize) which are making modifications to a shared object and a fixture which always sets the state of that shared object to a known, initial state.

Unfortunately when these test cases end up running in a different thread, then they can "step on each other's toes".

It would be nice to have a marker in xdist, which would allow us to force some tests to run under the same thread, alias sequentially.

zoltan-fedor avatar Dec 11 '18 01:12 zoltan-fedor

@akr8986 to answer question
Q: what is the kind of scheduling in xdist you have opted for??
A: I am using -n 3 in the command line. Does that answer the question/did I get the question wrong.

Q: Are the testcases which you want to execute sequentially spread across multiple different python test modules/packages?? A: Yes, they are spread across modules.

Thanks, Cauvery.

cauvery avatar Dec 11 '18 20:12 cauvery

@cauvery.. xdist supports 4 scheduling algorithms.. "each", "load", "loadscope", "loadfile".. now -n is a shortcut for load scheduling.. meaning the scheduler will load balance the tests across the workers..

If you take the case of loadfile scheduling, all testcases in a testfile will be executed sequentially by the same worker...

now from your requirement you will need something similar to loadFile but the scheduling will not be based on file but on some marker... i would suggest write a new scheduler all together.. Mark all the tests with a group name and distribute the tests having the same marker to a single workers and others to the rest of the workers..

Now the scheduler itself is not a plugin that can be installed in an environment.. perhaps @nicoddemus can comment on that..

akr8986 avatar Dec 12 '18 08:12 akr8986

Over on my team we are trying to figure out if there is a way we could use pytest-xdist but we have similar blockers, and a home-grown tool was born of of those called https://github.com/ansible/pytest-mp . It does some nice things, such as allowing us to group tests that may be run together but not with other tests, or tests that need to be run isolated and in serial, but does not do other things as well such as testing multiple targets at once or spreading the test execution across multiple nodes.

It would be nice if we could benefit from the larger community that python-xdist enjoys, will think if there is a way we can get some of this functionality we rely on into xdist.

:thinking:

kdelee avatar Mar 22 '19 15:03 kdelee

If you're still considering this feature request, my company is looking for something where we can run some tests in parallel, but others sequentially due to resource sharing. This would be a very useful feature for pytest-xdist.

neXussT avatar Oct 23 '19 20:10 neXussT

Isn't this enough? py.test -m sequential py.test -m "non sequential" -n8

atzorvas avatar Oct 26 '19 20:10 atzorvas

@atzorvas - I hoped it would be, but when I tried this, I ran into two severe problems with pytest-xdist, or how pytest works:

  • Fixtures do not show log lines unless there's a failure, and they are set to warning: https://github.com/pytest-dev/pytest-xdist/issues/402
  • Session scoped fixtures are run for every process: https://github.com/pytest-dev/pytest-xdist/issues/271

The latter is causing me a lot of problems, because it's creating collisions when I attempt to run tests on an AWS resource. If I have five processes, and they all configure the lambda I'm using at the same time, it fails. I have a session fixture to do this once, at the start of the test session. It must not be run more than once.

I'm pretty much stuck for finding a way to run tests in parallel. I'm likely going to have to execute jenkins runs in parallel, each with a subset of the tests. Not ideal. If anybody has a solution, I would love to investigate it.

neXussT avatar Oct 26 '19 20:10 neXussT

for the session fixture / run once problem, you should write a lock file and have one process do the run-once type code after it creates the lock file, have other processes polling for the lock file to be deleted before they carry on. We do exactly this

symonk avatar Oct 31 '19 12:10 symonk

Thanks @symonk . That sounds like a pretty decent solution, but probably won't work for my problem. Since I am updating AWS Lambda environmental variables as a session fixture, each process would execute the same fixture, while other processes are finished, causing intermittent issues.

The way I've designed my tests doesn't seem compatible with multi-processor testing.

neXussT avatar Nov 01 '19 12:11 neXussT

That sounds like a pretty decent solution, but probably won't work for my problem. Since I am updating AWS Lambda environmental variables as a session fixture, each process would execute the same fixture, while other processes are finished, causing intermittent issues.

Perhaps you can write the environment variables to the file protected by the lock, something like (untested):

from filelock import FileLock

@pytest.fixture(scope="session")
def my_session_fix(tmp_path_factory):
    if not worker_id:
        # not executing in multiple workers
        env_vars = create_env_vars()
    else:
        # get the temp directory shared for by all workers
        root_tmp_dir = tmp_path_factory.getbasetemp().parent
        f = root_tmp_dir / 'envs.json'
        with FileLock(str(f) + '.lock'):
            if not f.is_file():
                env_vars = create_env_vars()
                f.write_text(json.dumps(env_vars))
            else:
                env_vars = json.loads(f.read_text())

    os.environ.update(env_vars)

(This uses filelock)

Only one process each time will be able to get to the envs.json file: the first process creates the file with the environment variables encoded as a JSON file, the next processes will only read from the file.

(EDIT: my initial take did not really work, have fixed it now using the same approach as in https://github.com/pytest-dev/pytest-xdist/pull/483).

EDIT: there's a documented version in the docs now: https://pytest-xdist.readthedocs.io/en/latest/how-to.html#making-session-scoped-fixtures-execute-only-once

nicoddemus avatar Nov 02 '19 21:11 nicoddemus

I've opened https://github.com/pytest-dev/pytest-xdist/pull/483 adding an example to the README, would appreciate reviews. 👍

nicoddemus avatar Nov 02 '19 22:11 nicoddemus

Thanks @nicoddemus. After looking at your code, and reviewing @symonk 's last post again, this could be a viable solution for me. I probably only have to put this file lock wrapper around the session, and module fixtures, probably with just a decorator.

neXussT avatar Nov 03 '19 14:11 neXussT

Cool (I've update the example above again after realizing it was mixing code from #483)

nicoddemus avatar Nov 03 '19 15:11 nicoddemus

I also need this feature eagerly! The problem is that if I run pytest twice, I get two results.

$ pytest -n auto -m "not serial"
$ pytest -m "serial"

qwordy avatar Aug 05 '20 07:08 qwordy

This could be very cool indeed. I run some functional tests against a running system. Some of the tests impact very hardly on the system, so I'd need to isolatedly run those tests. A marker could be perfect :)

aorestr avatar Nov 06 '20 18:11 aorestr

@pytest.fixture(autouse=True)
def be_sequential(request):
    if request.node.get_closest_marker("sequential"):
        with FileLock("semaphore.lock"):
            yield
    else:
        yield

I'm using something like this. If you mark test as sequential they will be blocking between workers. Still it would be nice to send all sequential tests to one worker to have the shortest time of test execution.

MikaDariusz avatar Jan 19 '21 08:01 MikaDariusz

:sparkles: This is an old work account. Please reference @brandonchinn178 for all future communication :sparkles:


Just wrote this patch; I think it works. It would be nice to have this provided out of the box. (It doesn't actually solve my particular use-case, as I need these serial tests to run without any other tests running in parallel, but it solves the problem in this ticket)

from xdist import is_xdist_controller
from xdist.scheduling import LoadScopeScheduling

def pytest_configure(config):
    config.pluginmanager.register(XDistSerialPlugin())

class XDistSerialPlugin:
    def __init__(self):
        self._nodes = None

    @pytest.hookimpl(tryfirst=True)
    def pytest_collection(self, session):
        if is_xdist_controller(session):
            self._nodes = {
                item.nodeid: item
                for item in session.perform_collect(None)
            }
            return True

    def pytest_xdist_make_scheduler(self, config, log):
        return SerialScheduling(config, log, nodes=self._nodes)


class SerialScheduling(LoadScopeScheduling):
    def __init__(self, config, log, *, nodes):
        super().__init__(config, log)
        self._nodes = nodes

    def _split_scope(self, nodeid):
        node = self._nodes[nodeid]
        if node.get_closest_marker("serial"):
            # put all `@pytest.mark.serial` tests in same scope, to
            # ensure they're all run in the same worker
            return "__serial__"

        # otherwise, each test is in its own scope
        return nodeid

brandon-leapyear avatar Nov 06 '21 00:11 brandon-leapyear

I need these serial tests to run without any other tests running in parallel

We have the same requirement at work, but we solved it differently:

  1. We mark the tests that need to run serially with a @pytest.mark.serial mark.

  2. Execute pytest in parallel, excluding the tests with the mark:

    $ pytest -n auto -m "not serial"
    
  3. Execute pytest again serially, selecting only the marked tests:

    $ pytest -m "serial"
    

Just to complement that there's this alternative.

EDIT: just to not too that you can execute the two commands in the same job, you don't need separate jobs.

nicoddemus avatar Nov 06 '21 16:11 nicoddemus

Just wrote this patch; I think it works. It would be nice to have this provided out of the box. (It doesn't actually solve my particular use-case, as I need these serial tests to run without any other tests running in parallel, but it solves the problem in this ticket)

from xdist import is_xdist_controller
from xdist.scheduling import LoadScopeScheduling

def pytest_configure(config):
    config.pluginmanager.register(XDistSerialPlugin())

class XDistSerialPlugin:
    def __init__(self):
        self._nodes = None

    @pytest.hookimpl(tryfirst=True)
    def pytest_collection(self, session):
        if is_xdist_controller(session):
            self._nodes = {
                item.nodeid: item
                for item in session.perform_collect(None)
            }
            return True

    def pytest_xdist_make_scheduler(self, config, log):
        return SerialScheduling(config, log, nodes=self._nodes)


class SerialScheduling(LoadScopeScheduling):
    def __init__(self, config, log, *, nodes):
        super().__init__(config, log)
        self._nodes = nodes

    def _split_scope(self, nodeid):
        node = self._nodes[nodeid]
        if node.get_closest_marker("serial"):
            # put all `@pytest.mark.serial` tests in same scope, to
            # ensure they're all run in the same worker
            return "__serial__"

        # otherwise, each test is in its own scope
        return nodeid

@brandon-leapyear, thank you very much for this patch, it helped me a lot. The only thing is that the imports have changed a little from xdist.scheduling -> from xdist.scheduler

Used pytest==7.1.2 and Python 3.7.8

vamotest avatar Jun 10 '22 11:06 vamotest

Just wrote this patch; I think it works. It would be nice to have this provided out of the box. (It doesn't actually solve my particular use-case, as I need these serial tests to run without any other tests running in parallel, but it solves the problem in this ticket)

from xdist import is_xdist_controller
from xdist.scheduling import LoadScopeScheduling

def pytest_configure(config):
    config.pluginmanager.register(XDistSerialPlugin())

class XDistSerialPlugin:
    def __init__(self):
        self._nodes = None

    @pytest.hookimpl(tryfirst=True)
    def pytest_collection(self, session):
        if is_xdist_controller(session):
            self._nodes = {
                item.nodeid: item
                for item in session.perform_collect(None)
            }
            return True

    def pytest_xdist_make_scheduler(self, config, log):
        return SerialScheduling(config, log, nodes=self._nodes)


class SerialScheduling(LoadScopeScheduling):
    def __init__(self, config, log, *, nodes):
        super().__init__(config, log)
        self._nodes = nodes

    def _split_scope(self, nodeid):
        node = self._nodes[nodeid]
        if node.get_closest_marker("serial"):
            # put all `@pytest.mark.serial` tests in same scope, to
            # ensure they're all run in the same worker
            return "__serial__"

        # otherwise, each test is in its own scope
        return nodeid

@brandon-leapyear, thank you very much for this patch, it helped me a lot. The only thing is that the imports have changed a little from xdist.scheduling -> from xdist.scheduler

Used pytest==7.1.2 and Python 3.7.8

@vamotest Sorry but how do you apply the patch? I tried doing it by running the patch command in xdist folder but doesn't seem to work?

ngyingkai avatar Jul 07 '22 02:07 ngyingkai

@ngyingkai

It is necessary to put the above piece of code in conftest.py

And mark the tests themselves

@pytest.mark.serial
def test_my_awesome_serial():
    pass
    
def my_parallel_test():
    pass

And run

PYTHONHASHSEED=0 python3 -m pytest -n 4 --alluredir=allure_data

Then all serial tests will be run on one worker, and the rest in parallel on others.

I'm fine with integration tests, but for unit with the same scheme I get an error Unexpectedly no active workers available It seems from the point of view of the test run, everything is fine launch, but in job'e there is a drop from worker_internal_error

From what I managed to unwind in traceback (full traceback)

tests_1        | INTERNALERROR> E                 return self._hookexec(self, self.get_hookimpls(), kwargs)
tests_1        | INTERNALERROR> E               File "/usr/local/airflow/.local/lib/python3.7/site-packages/pluggy/manager.py", line 93, in _hookexec
tests_1        | INTERNALERROR> E                 return self._inner_hookexec(hook, methods, kwargs)
tests_1        | INTERNALERROR> E               File "/usr/local/airflow/.local/lib/python3.7/site-packages/pluggy/manager.py", line 87, in <lambda>
tests_1        | INTERNALERROR> E                 firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
tests_1        | INERNALERROR> E               File "/usr/local/airflow/.local/lib/python3.7/site-packages/pluggy/callers.py", line 208, in _multicall
tests_1        | INTERNALERROR> E                 return outcome.get_result()
...
tests_1        | =========== 3459 passed, 85 skipped, 7 warnings in 310.54s (0:05:10) ===========
tests_1        | RESULT_CODE=1

Perhaps it rests on the firstresult of the hook and loop_once

I found a similar one issue,but it seems to have been decided in anotherissue back in 2019. At the same time, I have the latest version of the pytest/pytest-xdist

Perhaps you can help, please? @brandon-leapyear brandon-leapyear or @nicoddemus nicoddemus I don't want to run serial tests first, and then parallel ones. Or create a separate stage for each of them

vamotest avatar Jul 07 '22 06:07 vamotest

Badly need this too!

felixmeziere avatar Aug 09 '22 17:08 felixmeziere

Use --dist=loadgroup (introduced in 2.5.0).

This allows you to run tests in parallel by default, and run specifically marked tests serially.

From https://pytest-xdist.readthedocs.io/en/latest/distribution.html:

[...] guarantees that all tests with same xdist_group name run in the same worker. Tests without the xdist_group mark are distributed normally as in the --dist=load mode.

The example below runs test_banana and test_apple in the same worker. The other tests are run as usual, i.e. they are distributed across workers.

import pytest

@pytest.mark.xdist_group(name="fruit")
def test_banana():
	print('banana')

@pytest.mark.xdist_group(name="fruit")
def test_apple():
	print('apple')

def test_broccoli():
	print('broccoli')

def test_carrot():
	print('carrot')

def test_mushroom():
	print('mushroom')

def test_fungus():
	print('fungus')

WilliamDEdwards avatar Nov 06 '22 19:11 WilliamDEdwards

@WilliamDEdwards

Use --dist=loadgroup (introduced in 2.5.0). This allows you to run tests in parallel by default, and run specifically marked tests serially.

But how this should control the order of execution? As I can see this only controls the place/worker of execution?

themperek avatar Nov 09 '22 16:11 themperek