pytest
pytest copied to clipboard
Refactor fixture finalization
Currently for each fixture which depends on another fixture, a "finalizer" is added to the list of the dependent fixture.
For example:
def test(tmpdir):
pass
tmpdir
, as we know, depends on tmpdir_path
to create the temporary directory. Each tmpdir
invocation ends up adding its finalization to the list of finalizers of tmpdir_path
. This is the mechanism used to finalize fixtures in the correct order (as thought we still have bugs in this area, as #1895 shows for example), and ensures that every tmpdir
will be destroyed before the requested tmpdir_path
fixture.
This then means that every high-scoped fixture might contain dozens, hundreds or thousands of "finalizers" attached to them. Fixture finalizers can be called multiple times without problems, but this consumes memory: each finalizer keeps its SubRequest
object alive, containing a number of small variables:
https://github.com/pytest-dev/pytest/blob/ed68fcf6659e623162325e72187358ba2255c4e0/src/_pytest/fixtures.py#L341-L359
This can easily be demonstrated by applying this patch:
diff --git a/src/_pytest/runner.py b/src/_pytest/runner.py
index 55dcd805..b3a94bc6 100644
--- a/src/_pytest/runner.py
+++ b/src/_pytest/runner.py
@@ -393,6 +393,8 @@ class SetupState(object):
for col in self.stack:
if hasattr(col, "_prepare_exc"):
six.reraise(*col._prepare_exc)
+ if self.stack:
+ print(len(self._finalizers.get(self.stack[0])))
for col in needed_collectors[len(self.stack) :]:
self.stack.append(col)
try:
(this prints the finalizers attached to the "Session" node, where the session fixtures attach their finalization to)
And running this test:
import pytest
@pytest.mark.parametrize('i', range(10))
def test(i, tmpdir):
pass
λ pytest .tmp\test-foo.py -qs
.1
.2
.3
.4
.5
.6
.7
.8
.9
.
10 passed in 0.16 seconds
I believe we can think of ways to refactor the fixture teardown mechanism to avoid this accumulation of finalizers on the objects. Ideally we should build a proper DAG of fixture dependencies which should be destroyed in the proper order. This would also make things more explicit and easier to follow IMHO.
we should set up a full topology to manage setup/teardown
I was reading the comments on #4055 where it seemed that no changes would be considered in the short term because of the risk of breaking tests in the wild.
Does opening this issue mean that this is now ripe enough for design and eventual implementation?
Does opening this issue mean that this is now ripe enough for design and eventual implementation?
Yep, just yesterday @RonnyPfannschmidt and I were chatting about this. It will be a gradual change but will eventually make the entire fixture execution model much more explicit and easier to follow than what we have today.
We plan to tackle this when he gets back from his hiatus.
Awesome, looking forward to it!
I guess it might also help #5460 (#3551 I believe was also using a directed graph approach for fixture dependency/execution).
Hi All, I made a PR #4055 for this issue over a year ago. One of the primary concerns that was made about the PR was that it might break plugins. Most of the major external plugins already have their own test suites. Would integrating the test suites of external plugins into the pytest test suite alleviate the current impasse? It seems like this might be a worthy task anyway to give confidence in any change to core code.
Would integrating the test suites of external plugins into the pytest test suite alleviate the current impasse?
We have thought about this, but it is kind of tricky: external test suites are outside of our control, so they might break for unrelated reasons, which would then break pytest. We could of course devise a way for those breakages to not block PRs and such, though.
https://github.com/pytest-dev/pytest/pull/7511 touches on one of the problems highlighted here.
Coming from #7511, but I believe in have a solution to this.
Currently, there's a few issues with regards to autouse and parameterized fixtures.
From the docs:
autouse fixtures obey the scope= keyword-argument: if an autouse fixture has scope='session' it will only be run once, no matter where it is defined. scope='class' means it will be run once per class, etc.
Given these fixtures:
@pytest.fixture(scope='session', autouse=True, params=[1, 2])
def s():
pass
@pytest.fixture(scope='class', autouse=True)
def c():
pass
One might expect c
to happen once and only once per class because it's autouse. The problem comes in when parameterization is involved (at least for a higher scope fixture).
Given these test classes:
class TestX:
def test_it(self):
pass
class TestY:
def test_it_again(self):
pass
Because pytest has to execute every test once for the first param set of s
before it can do it again for the second param set, that means it'll have to setup c
twice for each class because pytest had to tear it down after each class for the first param set and those tests are still dependent on c
because of autouse. In total, c
would be executed 4 times.
Here's the setup plan (from pytest 6.1.2)
SETUP S s[1]
SETUP C c
test2.py::TestX::test_it[1] (fixtures used: c, s)
TEARDOWN C c
test2.py::TestY::test_it_again[1]
SETUP C c
test2.py::TestY::test_it_again[1] (fixtures used: c, s)
TEARDOWN C c
test2.py::TestX::test_it[2]
TEARDOWN S s[1]
SETUP S s[2]
SETUP C c
test2.py::TestX::test_it[2] (fixtures used: c, s)
TEARDOWN C c
test2.py::TestY::test_it_again[2]
SETUP C c
test2.py::TestY::test_it_again[2] (fixtures used: c, s)
TEARDOWN C c
TEARDOWN S s[2]
Unfortunately, because of parameterization, there's no way to guarantee an autouse fixture only happens once per its defined scope.
However, I have an alternate proposal that I think might suit everyone's needs without breaking any existing test suites that properly define their dependencies..
In @bluetech's example from our discussions on #7511:
import pytest
@pytest.fixture(scope="module")
def m():
print('M SETUP')
yield
print('M TEARDOWN')
class TestIt:
@classmethod
@pytest.fixture(scope="class", autouse=True)
def c(cls):
print('C SETUP')
yield
print('C TEARDOWN')
def test_f1(self):
print('F1 CALL')
def test_f2(self, m):
print('F2 CALL')
setup plan
SETUP C c
test3.py::TestIt::test_f1 (fixtures used: c)
SETUP M m
test3.py::TestIt::test_f2 (fixtures used: c, m)
TEARDOWN C c
TEARDOWN M m
the m
fixture, to me, happens out of place, because it doesn't respect the ordering of scopes, until teardown (and the indentation throws me off).
Ideally, I would like this to be the setup plan for that example:
SETUP M m
SETUP C c
test3.py::TestIt::test_f1 (fixtures used: c)
test3.py::TestIt::test_f2 (fixtures used: c, m)
TEARDOWN C c
TEARDOWN M m
And this would be the plan if only test_f1
is to be run:
SETUP C c
test3.py::TestIt::test_f1 (fixtures used: c)
TEARDOWN C c
Since the m
fixture wasn't needed by anything, it could be cut out.
To me, this is much easier to not just reason about, but also to orchestrate.
This has pretty large implications, and could be incredibly challenging to figure out how to do this not just effectively, but in a way that folks can leverage in a predictable way for more advanced usage. But, I have an idea, and I think you'll get quite a kick out of it.
Fixture Resolution Order (FRO)
It works almost exactly like MRO (so it's a familiar concept), except the "child" we're trying to figure out the FRO of, is the master FRO. After the first pass, it can be sorted by scope (and possibly other things), and this can be used to ensure optimized, stack-like behavior.
Keep in mind, that entries in the FRO aren't required to be in the stack in terms of explicit dependencies. But it would figure out a deterministic, linearized order that allows fixtures to optimally be treated like a stack in regards to setup and teardown.
It can also look at the tests that would actually run (not just the ones that were collected) and considered their actual dependencies to find out the actual scopes that running certain fixtures would be necessary for, even if the first test in those scopes isn't actually dependent on them, so they can be preemptively run for those scopes while also not running for scopes that don't need them.
We can also apply some tweaks to it to make things operate more efficiently. For example, if autouse fixtures are shifted left in the FRO, then parameterized fixtures can be shifted as far right as can be (so as to affect as few other fixtures as possible), and if they're autouse parameterized fixtures, they can be shifted as right as possible among the autouse fixtures of their scope.
Nothing would ever get executed that didn't need to be, parameterized fixtures would have as reduced a footprint as they reasonably could, stack-like behavior would be achieved, reasoning about and planning setup plans would be easier (IMO), and, it allows for a good number of optimizations in pytest's internals.
What do you think?
Unfortunately, because of parameterization, there's no way to guarantee an autouse fixture only happens once per its defined scope.
I think that's fine TBH, we just need to make that explicit in the docs.
Fixture Resolution Order (FRO) What do you think?
That's a pretty neat idea! My initial thought about building a DAG of fixtures and use that for setup/teardown is similar, but you go a step further and remove the lazy approach we have now (by "lazy" I mean figuring out which fixtures are needed during test execution). This approach would give the full fixture resolution order right after collection is complete.
I can even imagine a neat API like this (highlevel-code):
plan = FixturePlan(items)
fixtures = plan.setup(items[0])
items[0].test_func(**fixtures)
plan.teardown(items[0])
I also think it would be possible to add this alongside the current implementation under a separate setting, so we can publish it as experimental and see how it fares in the wild.
@RonnyPfannschmidt @bluetech thoughts?
Unfortunately, because of parameterization, there's no way to guarantee an autouse fixture only happens once per its defined scope.
I think that's fine TBH, we just need to make that explicit in the docs.
Yeah, FRO wouldn't even guarantee it. But if it's sorted the way I mentioned, then it can at least reduce the number of times it would happen.
I remember someone even had an issue a while back where they wanted some mechanism that worked the opposite of autouse in that it would force fixtures as far "right" in the execution order as possible. They could probably exploit this for that purpose.
I can even imagine a neat API like this (highlevel-code):
plan = FixturePlan(items) fixtures = plan.setup(items[0]) items[0].test_func(**fixtures) plan.teardown(items[0])
Awesome! I'm wondering if it could even provide opportunities for new hooks to give users even more control. Maybe something like pytest_collection_modifyitems
, but for the fixtures found for each test. I think that'll probably depend on the actual implementation, though.
🤔
Some thoughts on the original issue (the excessive finalizer registrations). This is coming from @jakkdl's PR #11833 which cleaned up some related stuff. And no I don't expect anyone to read this through :grinning:
We are talking about the following code in FixtureDef.execute
:
https://github.com/pytest-dev/pytest/blob/2607fe8b4706fa701925db50f7892ddff6ed2928/src/_pytest/fixtures.py#L1037-L1044
What this does is, when we execute a fixture e.g. F1 which requests some other fixtures e.g. F2, F3, it registers F1's finish
with F2's and F3's finish. The idea is, since F1 depends on F2 and F3, it must be torn down before F2 and F3.
My first question is
Why is this code necessary anyway?
It is not clear at all why this code is needed. Because, every fixture registers its own finalizer with the relevant node right after it's executed:
https://github.com/pytest-dev/pytest/blob/2607fe8b4706fa701925db50f7892ddff6ed2928/src/_pytest/fixtures.py#L1064-L1070
We want the teardown order to be reverse setup order (AKA "stack" order). If we assume that the node tree is torn down correctly in stack order, and each fixture pushes its finalizer onto the stack right after setup, then we get the desired stack order, no extra stuff necessary.
So I just tried to comment out the "register finish with dependencies" code, run the pytest tests andd see what happens. And the failure that comes back looks like this:
import pytest
@pytest.fixture(scope="class", params=["a", "b"])
def clsparam(request):
return request.param
class TestClass:
@pytest.fixture(scope="class")
def clsfix(self, clsparam):
print(f"setup-{clsparam}")
yield
print(f"teardown-{clsparam}")
def test_method(self, clsparam, clsfix):
print(f"method-{clsparam}")
The expected output is:
setup-a
method-a
teardown-a
setup-b
method-b
teardown-b
but the actual (after commenting) is:
setup-a
method-a
method-b
teardown-a
The b
parametrization of clsfix
is gone.
Why does it happen? The flow of execution is:
-
test_method[a]
requestsclsparam
-
clsparam
with parama
is setup -
test_method[a]
requestsclsfix
-
clsfix
getsclsparam = a
, caches result withcache_key = 0
. The cache key is0
becauseclsfix
itself is not parametrized so it just gets this dummy constant cache key. -
test_method[b]
requestsclsparam
-
clsparam
with paramb
is setup -
test_method[b]
requestsclsfix
-
clsfix
computescache_key = 0
, which matches the saved result, so doesn't execute again.
So the problem here is that clsfix
depends on a parametrized fixture clsparam
, and clsparam = a
is invalidated in the switch from test_method[a]
and test_method[b]
, but nothing invalidates clsfix
in response.
Is this the only reason for the code?
I can't think of another reason, but it possible I'm missing something. The two test failures after commenting the code is due to this, so if there is another reason it is not covered by the tests at least.
How does the code fix it?
With the finalizer registration code, clsfix
registers its finish()
as a clsparam
finalizer. So when clsparam = a
is invalidates, clsfix
also gets invalidated.
But this solution sucks (this issue), so...
Other possible ways to fix the problem?
In the flow of execution above, there were 2 "failure points":
- There was nothing that invalidated the
clsfix
fixture when the class-scoped param switched under it. - The cache key matched, so
clsfix
didn't invalidate itself.
To me this conjures up two possible solutions:
Possible solution - parametrized higher-scoped collector nodes
The collection tree of our example is:
<Dir pytest>
<Module yy.py>
<Class TestClass>
<Function test_method[a]>
<Function test_method[b]>
But this is a bit funny - the a
, b
parametrization is class scoped, not function scoped. What if we instead made the class node parametrized in this case?
<Dir pytest>
<Module yy.py>
<Class TestClass[a]>
<Function test_method[a]>
<Class TestClass[b]>
<Function test_method[b]>
With this the SetupState
will take care to teardown clsfix
in the switch between TestClass[a]
and TestClass[b]
. This also completely removes the need for cache_key
at all.
Can this work? Ignoring backward compat entirely, I think the major downside of this is that class fixtures which are not (transitively) dependent on a class-scoped param will no longer be shared between test_method[a]
and test_method[b]
. I think this kills this idea but maybe there's some way to salvage it.
Possible solution - make the cache key not match
What if clsfix
somehow knows that a param it depends on changed under it between its cached execution and the current request, then it invalidates itself. This is basically equivalent to including clsparam
's cache key in clsfix
's cache key.
Can this work? I don't know. One thing that could have killed this idea is dynamic getfixturevalue
calls, but it is not possible to dynamically request a parametrized fixture (no way to specify the param), so I think this is not actually a problem. I.e. it should be possible to statically know the relevant params. But probably it not very easy.
In the meantime, anything we can do?
The bad example that @nicoddemus gave is of the function-scoped tmp_path
registering hundreds of finalizers on the session-scoped tmp_path_factory
(one for each time it's requested). I think this is the most common scenario where this issue is problematic. BUT, we know that in this specific case the entire thing is useless for two reasons:
-
tmp_path_factory
is not parametrized, not directly and not transitively, so the "parametrized dependency" problem doesn't exist here anyway. But tackling this angle is basically the "make the cache key not match" solution so I'll leave it be for now. -
NOTE: Following is incorrect: ~~
tmp_path_factory
has a higher scope thantmp_path
, and therefore it is (or should be...) guaranteed bySetupState
thattmp_path
is torn down beforetmp_path_factory
, regardless of any parametrization~~.
~~So, if we trust SetupState
and my analysis is correct then we should be able to change the code to only register a finalizer if the dependent fixture is the same scope (lower scope is not allowed at all), that is:~~
for argname in self.argnames:
fixturedef = request._get_active_fixturedef(argname)
if not isinstance(fixturedef, PseudoFixtureDef):
- fixturedef.addfinalizer(finalizer)
+ if fixturedef.scope == request.scope:
+ fixturedef.addfinalizer(finalizer)
~~This is mostly patching over the issue but should largely alleviate the pain so worth considering.~~