pytest Refactor fixture finalization

Currently for each fixture which depends on another fixture, a "finalizer" is added to the list of the dependent fixture.

For example:

def test(tmpdir):
   pass

tmpdir, as we know, depends on tmpdir_path to create the temporary directory. Each tmpdir invocation ends up adding its finalization to the list of finalizers of tmpdir_path. This is the mechanism used to finalize fixtures in the correct order (as thought we still have bugs in this area, as #1895 shows for example), and ensures that every tmpdir will be destroyed before the requested tmpdir_path fixture.

This then means that every high-scoped fixture might contain dozens, hundreds or thousands of "finalizers" attached to them. Fixture finalizers can be called multiple times without problems, but this consumes memory: each finalizer keeps its SubRequest object alive, containing a number of small variables:

https://github.com/pytest-dev/pytest/blob/ed68fcf6659e623162325e72187358ba2255c4e0/src/_pytest/fixtures.py#L341-L359

This can easily be demonstrated by applying this patch:

diff --git a/src/_pytest/runner.py b/src/_pytest/runner.py
index 55dcd805..b3a94bc6 100644
--- a/src/_pytest/runner.py
+++ b/src/_pytest/runner.py
@@ -393,6 +393,8 @@ class SetupState(object):
         for col in self.stack:
             if hasattr(col, "_prepare_exc"):
                 six.reraise(*col._prepare_exc)
+        if self.stack:
+            print(len(self._finalizers.get(self.stack[0])))
         for col in needed_collectors[len(self.stack) :]:
             self.stack.append(col)
             try:

(this prints the finalizers attached to the "Session" node, where the session fixtures attach their finalization to)

And running this test:

import pytest

@pytest.mark.parametrize('i', range(10))
def test(i, tmpdir):
    pass

λ pytest .tmp\test-foo.py -qs
.1
.2
.3
.4
.5
.6
.7
.8
.9
.
10 passed in 0.16 seconds

I believe we can think of ways to refactor the fixture teardown mechanism to avoid this accumulation of finalizers on the objects. Ideally we should build a proper DAG of fixture dependencies which should be destroyed in the proper order. This would also make things more explicit and easier to follow IMHO.

Mar 01 '19 21:03 nicoddemus

we should set up a full topology to manage setup/teardown

Jul 11 '19 14:07 RonnyPfannschmidt

I was reading the comments on #4055 where it seemed that no changes would be considered in the short term because of the risk of breaking tests in the wild.

Does opening this issue mean that this is now ripe enough for design and eventual implementation?

Jul 12 '19 00:07 Sup3rGeo

Does opening this issue mean that this is now ripe enough for design and eventual implementation?

Yep, just yesterday @RonnyPfannschmidt and I were chatting about this. It will be a gradual change but will eventually make the entire fixture execution model much more explicit and easier to follow than what we have today.

We plan to tackle this when he gets back from his hiatus.

Jul 12 '19 11:07 nicoddemus

Awesome, looking forward to it!

I guess it might also help #5460 (#3551 I believe was also using a directed graph approach for fixture dependency/execution).

Jul 12 '19 12:07 Sup3rGeo

Hi All, I made a PR #4055 for this issue over a year ago. One of the primary concerns that was made about the PR was that it might break plugins. Most of the major external plugins already have their own test suites. Would integrating the test suites of external plugins into the pytest test suite alleviate the current impasse? It seems like this might be a worthy task anyway to give confidence in any change to core code.

Jan 28 '20 02:01 afrobbins805

Would integrating the test suites of external plugins into the pytest test suite alleviate the current impasse?

We have thought about this, but it is kind of tricky: external test suites are outside of our control, so they might break for unrelated reasons, which would then break pytest. We could of course devise a way for those breakages to not block PRs and such, though.

Feb 03 '20 22:02 nicoddemus

https://github.com/pytest-dev/pytest/pull/7511 touches on one of the problems highlighted here.

Jul 30 '20 18:07 nicoddemus

Coming from #7511, but I believe in have a solution to this.

Currently, there's a few issues with regards to autouse and parameterized fixtures.

From the docs:

autouse fixtures obey the scope= keyword-argument: if an autouse fixture has scope='session' it will only be run once, no matter where it is defined. scope='class' means it will be run once per class, etc.

Given these fixtures:

@pytest.fixture(scope='session', autouse=True, params=[1, 2])
def s():
    pass
@pytest.fixture(scope='class', autouse=True)
def c():
    pass

One might expect c to happen once and only once per class because it's autouse. The problem comes in when parameterization is involved (at least for a higher scope fixture).

Given these test classes:

class TestX:
    def test_it(self):
        pass
            
class TestY:
    def test_it_again(self):
        pass

Because pytest has to execute every test once for the first param set of s before it can do it again for the second param set, that means it'll have to setup c twice for each class because pytest had to tear it down after each class for the first param set and those tests are still dependent on c because of autouse. In total, c would be executed 4 times.

Here's the setup plan (from pytest 6.1.2)

SETUP    S s[1]
      SETUP    C c
        test2.py::TestX::test_it[1] (fixtures used: c, s)
      TEARDOWN C c
test2.py::TestY::test_it_again[1]
      SETUP    C c
        test2.py::TestY::test_it_again[1] (fixtures used: c, s)
      TEARDOWN C c
test2.py::TestX::test_it[2]
TEARDOWN S s[1]
SETUP    S s[2]
      SETUP    C c
        test2.py::TestX::test_it[2] (fixtures used: c, s)
      TEARDOWN C c
test2.py::TestY::test_it_again[2]
      SETUP    C c
        test2.py::TestY::test_it_again[2] (fixtures used: c, s)
      TEARDOWN C c
TEARDOWN S s[2]

Unfortunately, because of parameterization, there's no way to guarantee an autouse fixture only happens once per its defined scope.

However, I have an alternate proposal that I think might suit everyone's needs without breaking any existing test suites that properly define their dependencies..

In @bluetech's example from our discussions on #7511:

import pytest

@pytest.fixture(scope="module")
def m():
    print('M SETUP')
    yield
    print('M TEARDOWN')
            
class TestIt:
    @classmethod
    @pytest.fixture(scope="class", autouse=True)
    def c(cls):
        print('C SETUP')
        yield
        print('C TEARDOWN')
                                    
    def test_f1(self):
        print('F1 CALL')
            
    def test_f2(self, m):
        print('F2 CALL')

setup plan

      SETUP    C c
        test3.py::TestIt::test_f1 (fixtures used: c)
    SETUP    M m
        test3.py::TestIt::test_f2 (fixtures used: c, m)
      TEARDOWN C c
    TEARDOWN M m

the m fixture, to me, happens out of place, because it doesn't respect the ordering of scopes, until teardown (and the indentation throws me off).

Ideally, I would like this to be the setup plan for that example:

    SETUP    M m
      SETUP    C c
        test3.py::TestIt::test_f1 (fixtures used: c)
        test3.py::TestIt::test_f2 (fixtures used: c, m)
      TEARDOWN C c
    TEARDOWN M m

And this would be the plan if only test_f1 is to be run:

      SETUP    C c
        test3.py::TestIt::test_f1 (fixtures used: c)
      TEARDOWN C c

Since the m fixture wasn't needed by anything, it could be cut out.

To me, this is much easier to not just reason about, but also to orchestrate.

This has pretty large implications, and could be incredibly challenging to figure out how to do this not just effectively, but in a way that folks can leverage in a predictable way for more advanced usage. But, I have an idea, and I think you'll get quite a kick out of it.

Fixture Resolution Order (FRO)

It works almost exactly like MRO (so it's a familiar concept), except the "child" we're trying to figure out the FRO of, is the master FRO. After the first pass, it can be sorted by scope (and possibly other things), and this can be used to ensure optimized, stack-like behavior.

Keep in mind, that entries in the FRO aren't required to be in the stack in terms of explicit dependencies. But it would figure out a deterministic, linearized order that allows fixtures to optimally be treated like a stack in regards to setup and teardown.

It can also look at the tests that would actually run (not just the ones that were collected) and considered their actual dependencies to find out the actual scopes that running certain fixtures would be necessary for, even if the first test in those scopes isn't actually dependent on them, so they can be preemptively run for those scopes while also not running for scopes that don't need them.

We can also apply some tweaks to it to make things operate more efficiently. For example, if autouse fixtures are shifted left in the FRO, then parameterized fixtures can be shifted as far right as can be (so as to affect as few other fixtures as possible), and if they're autouse parameterized fixtures, they can be shifted as right as possible among the autouse fixtures of their scope.

Nothing would ever get executed that didn't need to be, parameterized fixtures would have as reduced a footprint as they reasonably could, stack-like behavior would be achieved, reasoning about and planning setup plans would be easier (IMO), and, it allows for a good number of optimizations in pytest's internals.

What do you think?

Nov 14 '20 15:11 SalmonMode

Unfortunately, because of parameterization, there's no way to guarantee an autouse fixture only happens once per its defined scope.

I think that's fine TBH, we just need to make that explicit in the docs.

Fixture Resolution Order (FRO) What do you think?

That's a pretty neat idea! My initial thought about building a DAG of fixtures and use that for setup/teardown is similar, but you go a step further and remove the lazy approach we have now (by "lazy" I mean figuring out which fixtures are needed during test execution). This approach would give the full fixture resolution order right after collection is complete.

I can even imagine a neat API like this (highlevel-code):

plan = FixturePlan(items)

fixtures = plan.setup(items[0])
items[0].test_func(**fixtures)
plan.teardown(items[0])

I also think it would be possible to add this alongside the current implementation under a separate setting, so we can publish it as experimental and see how it fares in the wild.

@RonnyPfannschmidt @bluetech thoughts?

Nov 27 '20 12:11 nicoddemus

Unfortunately, because of parameterization, there's no way to guarantee an autouse fixture only happens once per its defined scope.

I think that's fine TBH, we just need to make that explicit in the docs.

Yeah, FRO wouldn't even guarantee it. But if it's sorted the way I mentioned, then it can at least reduce the number of times it would happen.

I remember someone even had an issue a while back where they wanted some mechanism that worked the opposite of autouse in that it would force fixtures as far "right" in the execution order as possible. They could probably exploit this for that purpose.

I can even imagine a neat API like this (highlevel-code):
plan = FixturePlan(items)

fixtures = plan.setup(items[0])
items[0].test_func(**fixtures)
plan.teardown(items[0])

Awesome! I'm wondering if it could even provide opportunities for new hooks to give users even more control. Maybe something like pytest_collection_modifyitems, but for the fixtures found for each test. I think that'll probably depend on the actual implementation, though.

🤔

Nov 27 '20 13:11 SalmonMode

Some thoughts on the original issue (the excessive finalizer registrations). This is coming from @jakkdl's PR #11833 which cleaned up some related stuff. And no I don't expect anyone to read this through :grinning:

We are talking about the following code in FixtureDef.execute:

https://github.com/pytest-dev/pytest/blob/2607fe8b4706fa701925db50f7892ddff6ed2928/src/_pytest/fixtures.py#L1037-L1044

What this does is, when we execute a fixture e.g. F1 which requests some other fixtures e.g. F2, F3, it registers F1's finish with F2's and F3's finish. The idea is, since F1 depends on F2 and F3, it must be torn down before F2 and F3.

My first question is

Why is this code necessary anyway?

It is not clear at all why this code is needed. Because, every fixture registers its own finalizer with the relevant node right after it's executed:

https://github.com/pytest-dev/pytest/blob/2607fe8b4706fa701925db50f7892ddff6ed2928/src/_pytest/fixtures.py#L1064-L1070

We want the teardown order to be reverse setup order (AKA "stack" order). If we assume that the node tree is torn down correctly in stack order, and each fixture pushes its finalizer onto the stack right after setup, then we get the desired stack order, no extra stuff necessary.

So I just tried to comment out the "register finish with dependencies" code, run the pytest tests andd see what happens. And the failure that comes back looks like this:

import pytest

@pytest.fixture(scope="class", params=["a", "b"])
def clsparam(request):
    return request.param

class TestClass:
    @pytest.fixture(scope="class")
    def clsfix(self, clsparam):
        print(f"setup-{clsparam}")
        yield
        print(f"teardown-{clsparam}")

    def test_method(self, clsparam, clsfix):
        print(f"method-{clsparam}")

The expected output is:

setup-a
method-a
teardown-a
setup-b
method-b
teardown-b

but the actual (after commenting) is:

setup-a
method-a
method-b
teardown-a

The b parametrization of clsfix is gone.

Why does it happen? The flow of execution is:

test_method[a] requests clsparam
clsparam with param a is setup
test_method[a] requests clsfix
clsfix gets clsparam = a, caches result with cache_key = 0. The cache key is 0 because clsfix itself is not parametrized so it just gets this dummy constant cache key.
test_method[b] requests clsparam
clsparam with param b is setup
test_method[b] requests clsfix
clsfix computes cache_key = 0, which matches the saved result, so doesn't execute again.

So the problem here is that clsfix depends on a parametrized fixture clsparam, and clsparam = a is invalidated in the switch from test_method[a] and test_method[b], but nothing invalidates clsfix in response.

Is this the only reason for the code?

I can't think of another reason, but it possible I'm missing something. The two test failures after commenting the code is due to this, so if there is another reason it is not covered by the tests at least.

How does the code fix it?

With the finalizer registration code, clsfix registers its finish() as a clsparam finalizer. So when clsparam = a is invalidates, clsfix also gets invalidated.

But this solution sucks (this issue), so...

Possible solution - parametrized higher-scoped collector nodes

The collection tree of our example is:

<Dir pytest>
  <Module yy.py>
    <Class TestClass>
      <Function test_method[a]>
      <Function test_method[b]>

But this is a bit funny - the a, b parametrization is class scoped, not function scoped. What if we instead made the class node parametrized in this case?

<Dir pytest>
  <Module yy.py>
    <Class TestClass[a]>
      <Function test_method[a]>
    <Class TestClass[b]>
      <Function test_method[b]>

With this the SetupState will take care to teardown clsfix in the switch between TestClass[a] and TestClass[b]. This also completely removes the need for cache_key at all.

Can this work? Ignoring backward compat entirely, I think the major downside of this is that class fixtures which are not (transitively) dependent on a class-scoped param will no longer be shared between test_method[a] and test_method[b]. I think this kills this idea but maybe there's some way to salvage it.

Possible solution - make the cache key not match

What if clsfix somehow knows that a param it depends on changed under it between its cached execution and the current request, then it invalidates itself. This is basically equivalent to including clsparam's cache key in clsfix's cache key.

Can this work? I don't know. One thing that could have killed this idea is dynamic getfixturevalue calls, but it is not possible to dynamically request a parametrized fixture (no way to specify the param), so I think this is not actually a problem. I.e. it should be possible to statically know the relevant params. But probably it not very easy.

In the meantime, anything we can do?

The bad example that @nicoddemus gave is of the function-scoped tmp_path registering hundreds of finalizers on the session-scoped tmp_path_factory (one for each time it's requested). I think this is the most common scenario where this issue is problematic. BUT, we know that in this specific case the entire thing is useless for two reasons:

tmp_path_factory is not parametrized, not directly and not transitively, so the "parametrized dependency" problem doesn't exist here anyway. But tackling this angle is basically the "make the cache key not match" solution so I'll leave it be for now.
NOTE: Following is incorrect: ~~tmp_path_factory has a higher scope than tmp_path, and therefore it is (or should be...) guaranteed by SetupState that tmp_path is torn down before tmp_path_factory, regardless of any parametrization~~.

~~So, if we trust SetupState and my analysis is correct then we should be able to change the code to only register a finalizer if the dependent fixture is the same scope (lower scope is not allowed at all), that is:~~

         for argname in self.argnames:
             fixturedef = request._get_active_fixturedef(argname)
             if not isinstance(fixturedef, PseudoFixtureDef):
-                fixturedef.addfinalizer(finalizer)
+                if fixturedef.scope == request.scope:
+                    fixturedef.addfinalizer(finalizer)

~~This is mostly patching over the issue but should largely alleviate the pain so worth considering.~~

Mar 17 '24 21:03 bluetech

pytest
pytest copied to clipboard

Refactor fixture finalization

Why is this code necessary anyway?

Is this the only reason for the code?

How does the code fix it?

Other possible ways to fix the problem?

Possible solution - parametrized higher-scoped collector nodes

Possible solution - make the cache key not match

In the meantime, anything we can do?

pytest pytest copied to clipboard

Refactor fixture finalization

Why is this code necessary anyway?

Is this the only reason for the code?

How does the code fix it?

Other possible ways to fix the problem?

Possible solution - parametrized higher-scoped collector nodes

Possible solution - make the cache key not match

In the meantime, anything we can do?

pytest
pytest copied to clipboard