pytest-xdist
pytest-xdist copied to clipboard
Session-Scoped Fixtures are not Session-Scoped with Pytest-Xdist
I am fairly new to this project. I very recently migrated a software project test-suite from nosetest to pytest, mainly because of the Xdist benefits I had heard of.
The problem is that my Tests are depending on a big fixture setup (table creation + heavy loading of data) that I would like to share across all my tests.
The current Xdist behaviour is as follow:
- Collect test
- Split tests amongst the user-defined number of processes
- Launch a pytest session for each worker
Obviously, if each test depend on a heavy fixture, then multiplying the number of creation per number of worker is not going to help.
Additionally, it simply breaks the expected pytest behaviour for 'session' scoped fixtures.
I think this should be fairly simple to address this problem although I didn't take a really deep look into it. If you need help to solve this problem, I am more than willing to contribute if you feel this suggestion for improvement is relevant.
Greetings.
Hi @elijahbal,
The problem is that each worker lives in a different process, so it would be hard to share a single session scoped fixture between the workers in a general manner.
For one you have the problem of serializing any object returned by a fixture so it can be sent to another process, on demand. pickle
might seem like the obvious solution to this problem, but a lot of objects are not pickable.
Another problem, much harder IMHO, is how do you keep the object returned by the fixture synchronized among the worker processes? After all a session fixture might return a mutable object, so its state must be synchronized somehow as well.
Another problem is that one must also deal with parallelism, given that tests are now running in parallel in separate processes. The resources probably will need to be synchronized using a lock.
Those are the reasons from the top of my head why it would be hard to implement the proper scope rules in a way that session scoped fixtures are only instantiated once per-session, as opposed to once per-worker like it is today.
Happy to have so fast an answer. Yes, the problem of having a different space for each process to live on is a tough one, and parallelizing the computation comes at the cost of a greater complexiy for handling the process, as far I understand.
First of all, concerning the ability to serialize and to large objects between processes, I agree with you, in my experience, it is hard to serialize complex objects with pickle.
As for the third difficulty you mention, yes, indeed, the more direct-and-obvious approach to have global session object is to use a system lock at the operating system level, regardless of the programming language.
For your point 1 and 2, I want to adress some part of the problem and I think it will not appear a so big difficulty after examining the use case more closely.
The typical use case of spawning large fixture objects is usually not to share big collections of references (aka very big objects), but rather to prepare the underlying system to be in a specific case. We could include for example:
- spawning many containerized application for a large micro-services software
- setting up a database in a pristine state with a clean table structure
- Gathering a big set of data on the local storage drive from a collection of network connections
- etc.
So here the problem is not so much to share a big object between process but to ensure that the fixture preparation process is run only once from a single process. The synchronization between processes of such objects is not really important, because basically these are objects that are precisely meant to be accessed in a concurrent fashion (webserver, database, etc).
From this point of view, I think (but maybe I am wrong) that the xdist initialization step (collection of test and the preparation of the test sessions with distribution of tests amongst runner) is especially well suited to perform such a task.
So maybe a good compromise would be to have a special xdist marker for such fixtures that would care about the initialization process. As for the return value of such fixtures, it would be probably best to enforce for such special fixtures that they return at first call an immutable primitive object (eg a string or at max a tuple of strings) that would be kept in memory by xdist.
Does it seem to you a good idea ? And does it seem to you something doable in practice given the current xdist codebase ?
I'm having a similar issue with session scoped fixtures when n > 1. My project does some of the things @elijahbal points out above.
The teardown section in the session fixture is being executed more than once, and worst of it, out of order: before other tests have ended.
I'm looking for a way to signal "there are no more pending tests" to workaround this. If such case, a worker can check if it's the last one and effectively execute such section or do nothing. Is there a way? Would be very helpful! (i.e. a lock as @nicoddemus commented)
No idea actually how this could be done. The quickest workaround I could come with is to use a separate resource for each one of the xdist process.
Hi folks, sorry for the late response, this issue slipped through the cracks.
@elijahbal
Does it seem to you a good idea ? And does it seem to you something doable in practice given the current xdist codebase ?
Not really, and I don't think just because of pytest-xdist but how pytest itself is structured, the session scope fixture instantiated in some worker will be destroyed once that worker runs out of tests, regardless if there are other workers still needing that resource for example.
@elijahbal and @jrodriguezntt I can't find a way to do what you need with fixtures, but perhaps using plain hooks executed only on master
can be responsible for creating the shared resource and destroying it (pytest_sessionstart/finish
come to mind)? This has the problem of course that you might end up initializing a resource that won't even be used (for example if -k
is used to select some tests), but might be worth giving this a try.
I've managed to workaround this by doing nothing in the teadown section (after yield). A cleanup bash script is executed after py.test, but it's a pity, because this (IMHO) breaks the logic of the scope='session'
option (once per session). I understand the technical problem, nonetheless.
One approach could be to use semaphores or any other shared resource if workers are running in the same machine. Another one is to use a shared resource like memchached, but this might be a bit too complicated (not sure if it's worth it).
The hook on master is also a good idea, provided it's the last one to execute (once the workers have finished). But I have no experience using them. Any suggestions will be welcome! :)
each process is a own session, while its a nice to have feature to have a scope that spanns processes, its also important to note that there is no good sane and simple way to implement it - the proposed approaches are all no-gos on techical merit
Hi,
I'm experiencing this same issue, I have tests that I want to run in parallel but a setup that has to be done only once before all the tests run
According to what I see the setUp is run once per thread that xdist creates
Is it possible to make a mutex between the xdist threads ? Do we have access to any like this ?
While the following workarounds are probably obvious to people discussing here, they may not be obvious to xdist noobs like me who googled and landed here while skipping the documentation. So mentioning them briefly:
if ("PYTEST_XDIST_WORKER" not in os.environ or
os.environ["PYTEST_XDIST_WORKER"] == "gw0"):
<initialize something shared in common>
<create some empty file INIT_DONE_BY_THREAD_ZERO>
else:
<wait for INIT_DONE_BY_THREAD_ZERO file to exist>
if ("PYTEST_XDIST_WORKER_COUNT" not in os.environ or
os.environ["PYTEST_XDIST_WORKER_COUNT"] == 1):
....
Another problem, much harder IMHO, is how do you keep the object returned by the fixture synchronized among the worker processes? After all a session fixture might return a mutable object, so its state must be synchronized somehow as well.
In my case and I guess many others, the part of the fixture that is time-consuming to download is read-only. Modern languages encourage read-only structures for obvious... concurrency reasons https://doc.rust-lang.org/book/second-edition/ch03-01-variables-and-mutability.html
Note in my case the "better" fix would probably be to move the downloads outside Python and to "make test" and "make clean" targets - but that is more work.
how about simply putting a http download cache into .pytest-cache
How do you synchronize access to .pytest-cache
(or request.config.cache.makedir
)?
Does pytest have something builtin for this?
I would be ok with a solution where the first worker that arrives does the computation and stores it in cache, and those that come after use it. But multiplatform locking is annoying...
I use Postgres, and found that create database from template
is much faster than creating a database fixture from scratch. So I have each process launched by xdist create its database from a template database. If the template does not exist, I use a system wide lock to have the xdist process that gains the lock first create the template (from JSON files), while blocking the others from proceeding. When the template is ready, the lock is released, all the other processes find that the template already exists and create their database instances directly from the template. I went from maybe 10 minutes to a couple of seconds getting the database fixtures set up.
@pytest.yield_fixture(scope='session')
def db_setup(request):
db_set_unique_db_name(request)
with posix_ipc.Semaphore(
'/{}'.format(__name__), flags=posix_ipc.O_CREAT, initial_value=1
):
if not db_template_exists():
db_template_create_blank()
db_template_populate_by_json()
db_template_migrate()
db_create_from_template()
yield
db_drop()
Notes about the lock:
posix_ipc
is at https://pypi.org/project/posix_ipc/
(No affiliation).
The regular multiprocessing.Lock() context manager did not work for me here. Also tried creating the lock at module scope, and also directly calling acquire() and release(). It's probably related to how the worker processes relate to each other when launched by pytest-xdist as compared to what the multiprocessing module expects.
or use a file lock? only when you finish your init ENV , others process will go on?
Don't you already have IPC/an event system between Manager and Workers?
Couldn't you use the existing events system to handle fixtures xdist_group_lock
, xdist_global_lock
and xdist_group
and xdist_global
, where:
-
xdist_xxx_lock
are locks that can be acquired by workers -
xdist_xxx
are dicts restricted to hashable, serializable data, where small size is recommended
..?
Then you could easily do:
def some_test(xdist_group_lock, xdist_group):
xdist_group_lock.acquire() # blocks
if not xdist_group['initialized']:
initialize_things()
xdist_group['initialized'] = True
xdist_group_lock.release()
..and those primitives aren't too hard to do over a network -- particularly one you already have an event system over. Yes, it would slow things down a tad for some people that use it, particularly if they abuse it, but there's still a net benefit in most cases.
@eode it is probably possible. I believe this can even be implemented outside xdist as a proof of concept first.
I've done this kind of control with other pytest plugins by using multiprocessing.lock()
set up in conftest.py
, but those plugins used the multiprocessing module, and apparently loaded conftest.py
before forking / creating other processes.
However, I don't know the method that xdist uses to distribute work. However xdist does it, making a lock in conftest.py
doesn't work. My suspicion is that xdist either doesn't use the multiprocessing module, or it forks before conftest.py
is imported.
So, proof-of-concept works -- however, some kind of distributed Lock
(and probably a distributed dict
) would be needed, but it needs to be via a means of communication that is just as guaranteed as your normal IPC -- so, using whatever IPC you normally use.
As to the general principle -- it's been tested, and works.
xdist currently uses execnet and is incompatible with multiprocessing mechanisms #302 is supposed to elevate that
I believe this can even be implemented outside xdist as a proof of concept first.
I should have been clearer, but I meant that providing such a fixture, which uses a file lock behind the scenes, is possible.
There are other use cases that a file lock won't cover -- like when doing distributed testing, and using the same database back-end, or doing some other kind of setup that applies collectively. Locks can be done with channels, with the lock hosted server-side.
@eode Network distributed testing should be covered by another type of lock, since implicit locking between machines probably would be unwanted as often as not. The new type of lock should probably use the database. I think testing based on other network shared resources is esoteric enough that it should be handled on a case by case basis by the user.
Not that a random user matters much, but FYI this is the reason I chose and use a less-developed framework for parallel testing.
I've implemented a lock, and a dict, with execnet before, it's definitely doable -- and if the client isn't able to communicate with the server via channels, then all is pretty much lost there anyways, as far as execnet is concerned.
What do you mean about implicitly locking? That seems like a bad idea. I was more thinking of a 'sessionlock', 'modulelock', and 'classlock' fixture that would need to be explicitly called and used by a test.
The raw fact of the matter is that sometimes, synchronization is needed between clients that aren't on the same physical system with access to the same data. Communication is also sometimes needed.
With implicit locking, I was thinking about session scoped fixtures automatically being session scoped across multiple machines on the network.
Support for various more specific scopes, like a network wide session scope, would be nice. Might be better than exposing locks directly?
Hey, that's not a bad idea. An 'xdist-session' scope that executes before even distributing tests would solve a lot of use cases that would otherwise require synchronization. It would even remove the need for locking in many cases.
Speaking of scopes, i discovered in pytest that the session scope executes after tests have been imported and organized. The intuitive place to mock globally seemed to be the session scope, but not so. If mocking somelib.bar
from the session scope, you need to track down every case of from somelib import bar
and patch that individually.
That makes mocking of some python libraries a pain, since there's no scope that supports post-conftest, pre-code-import. I think I ended up piggy-backing on the config plugin hook or similar. That worked, but coming from outside the project, it wasn't easy figuring out what to do, and was very counterintuitive and not clearly documented -- which really stands out in pytest, because pytest is mostly intuitive and well-documented. :-)
Also to pytest's credit, the fix once found was nice and succinct.
Anyways, to the point - Additional scopes sound like a good idea to me.
I'm not familiar with execnet, but the implementation overview mentions a pytest_xdist_make_scheduler
hook. Maybe that's a good starting point for adding the scopes?
Unfortunately, I probably won't have time to help with the implementation. The workaround I outlined previously in this ticket was all that was needed in my particular case.
I am also having this issue with n>1. My session fixtures are executed multiple times. I am testing with AWS resources, and keep hitting "Resource Conflict", due to multiple fixtures trying to write environmental variables to the same lambda at the same time. This could be solved, as stated above with the session fixtures running once.
a key problem with that is that python has no exchange mechanism that allows to do this safely over multiple processes
i believe something like pytest_configure_resources that sets up the global resources would help as a hook, design work on that one is needed
Hi, after a while I am still trying to deal with this
If I understand correctly there is still no "official" solution for the issue of having to perform a series of actions only once before the tests start to run in parallel,
Is still the best workaround is to "split" the run in two, one "fake" run which makes all the preparations and then the actual parallel run ???
yup, please note that with the current design of pytest-xdist its just the wrong thing to do in many cases
@RonnyPfannschmidt Could you elaborate?
@rogerdahl in situations where xdist is used to distribute tests across networks its very unclear where to set up fixtures
as such setting up fixtures only once is simply a no go, so the fixture life-cycles are bound to the process
pytest doesn't yet have a concept for resources obtained for a specific test run