pytest RFC. Support for resetting function-scoped fixtures

What's the problem this feature will solve?

Support for per-execution fixtures in executors that wrap each test function and runs the function multiple times (for different examples) have been a long-standing issue, for example:

https://github.com/pytest-dev/pytest/issues/916
https://github.com/HypothesisWorks/hypothesis/issues/377

The motivating example is hypothesis, but in an effort to keep this issue more general I'll just refer to it as a "multi-executor" here. I'm not familiar with other multi-executors, but if there are any then the chosen solution should hopefully be usable by those as well.

Some of the reasons that this has been stalled are that the example executions are not up-front enumerable, and that the test item should still be failed/passed and reported as a unit — hence, not a good fit for subtests or other declarative-style approaches.

I propose a simpler solution, which is to expose a method that the multi-executor can call in-between examples, to explicitly signal the start of a new example and reset function-scoped fixtures.

Describe the solution you'd like

My proposed solution is, briefly:

Add a new scope item, in-between function and class. The new scope is equivalent to function unless running with a multi-executor, but is scoped to the test item so that it is not reset between repeated executions of the function for the same test item. Few fixtures will have the item scope, but an example of a fixture having natural item scope is request.

This new scope is not required for the correctness of the proposed functionality, but since each test item can be executed hundreds of times, there are performance implications that make it highly desirable. (plus, it's a helpful concept for explaining multi-executors in documentation).

Add a fixture-reset method to FixtureRequest, to be called by the multi-executor when function-scoped fixtures should be reset.

I realize this is backwards control flow from the usual pytest hooks, but it's only an integration point for multi-executors, not anything that end users will have to care about.

It's also possible to encapsulate the backwards call itself in a pytest-supplied decorator/wrapper — such a wrapper is actually implemented as proof-of-concept in the PR tests, and could be used as an alternative integration point: Execute the wrapper multiple times instead of the original function, and the wrapper takes care of resetting fixtures and passing their new values to the function.

To ensure consistent dependency resolution, a sequence counter is added to fixture executions so that they can be reset in the exact same order that they were originally executed.

The rest are implementation details, shown in the (working & with tests) prototype PR #12597.

One fixture (tmp_path) was not compatible with resets due to expecting to always having run the end-of-item pytest_runtest_makereport before teardown.

The fix was easy, but it is an indication that there may be other such incompatible fixtures in the wild. I believe this is ok, as they are not impacted unless running with a multi-executor — which was never safe before, and (opt-out) disallowed by at least hypothesis.

What do you think?

Additional context

https://github.com/pytest-dev/pytest/issues/9699 - requesting a scope below function, to support competing dependency-mutating fixtures. This is different from, and not addressed by, this proposal.
https://github.com/HypothesisWorks/hypothesis/issues/2370 - providing some examples for the utility of item-scoped fixtures with hypothesis.
https://github.com/untitaker/pytest-subtesthack/ has been mentioned as an alternative, but it's archived now (and seems to require additional boilerplate instead of being correct-by-default).

Jul 10 '24 20:07 jobh

the scope name "item" is misleading

having a scope name per function call is also misleading as "function" is sce scope per call

this really needs a design scope out, as practically this is a type of dynamic parameterize and we need propper ways of reporting this and actually remaking fixtures of the function scope for sanity

the whole idea of a "multi-executor" needs to be fleshed out in a more nuanced manner, as we need to set out for correct reporting, and need to consider plugins like rerunfailures (that currently are messy wrt setupstate) and repeat (which are messy wrt abusing fixture parameters for re-execution

i dont want to deter the attempt, but the introduction of package scope while it was not quite complete turned out to carry some massive pain points that are not clearly solved and a "multi executor" idea to me seems like something with a potential for similar problems

i want to avoid the mistakes we made with package scope

additionally we are also working on a definition scope (which is per test definition and it applies so that multiple parameterizations of the same item may use it

Jul 11 '24 11:07 RonnyPfannschmidt

Thanks for the feedback @RonnyPfannschmidt! I understand your concerns about adding a new scope, but I also think it fits quite naturally if we can find a good naming and well-understood semantics for it.

the scope name "item" is misleading having a scope name per function call is also misleading as "function" is sce scope per call

Just to clarify: The intended semantics for the new scope is "thing that is executed and reported by pytest", do you think it should be called node or test, instead of item? The function scope is in my mind the test-function, as written by the user; which is also the "thing that is executed by pytest", unless a multi-executor wraps it into something that is executed many times. This proposal aims to make the function scope match the as-written function even if running in this way.

this really needs a design scope out, as practically this is a type of dynamic parameterize and we need propper ways of reporting this and actually remaking fixtures of the function scope for sanity

It can be seen as dynamic parametrization, yes. But I'm not convinced we need pytest to report it as such, the status quo of repeated executions without pytest's knowledge is working well with the exception of function fixtures. I worry that aiming too high here will just lead to no progress, because that's hard.

the whole idea of a "multi-executor" needs to be fleshed out in a more nuanced manner, as we need to set out for correct reporting, and need to consider plugins like rerunfailures (that currently are messy wrt setupstate) and repeat (which are messy wrt abusing fixture parameters for re-execution

Thanks! I'll look into these two presently, to see if there's enough overlap to classify them in the same way and reuse ideas. The concept itself is just "something that wraps the test function to run several times in a single pytest test", which is well established by hypothesis and don't need much more fleshing-out?

the introduction of package scope while it was not quite complete turned out to carry some massive pain points that are not clearly solved and a "multi executor" idea to me seems like something with a potential for similar problems

Understood. Note, the "multi executor" idea is not new, only the term I coined for it, to aim for (limited) generality. Read it as "hypothesis" if that is clearer.

additionally we are also working on a definition scope (which is per test definition and it applies so that multiple parameterizations of the same item may use it

That should be fine, it would be orthogonal to this proposal, with class > definition > item > function. The function scope would still be the only target for the proposed reinitialization.

Jul 11 '24 12:07 jobh

part of me is getting the impression that we shouldnt have a new scope, instead we should enable a "function/item" to be run any number of time while rerunning its function scope setup entirely (this avoids issues like tmpdir setup/teardown and other confusions as it would all be about being able to run a item multiple times for different reasons (repeat, flakyness, different hypothesis datasets)

however that type of rerunning is something that needs a change in the runtestprotocol, which is one of the harder changes of the internals

Jul 11 '24 12:07 RonnyPfannschmidt

need to consider plugins like rerunfailures (that currently are messy wrt setupstate) and repeat (which are messy wrt abusing fixture parameters for re-execution

I checked these

pytest-repeat looks fine to me, isn't this an acceptable use of fixture parameters? Anyhow, it is unaffected by this change, as it uses an established method for repeats (parametrization).
pytest-rerunfailures could be a user of the new reset method proposed here, it basically implements an equivalent method itself, except that it resets wider-scoped fixtures as well and doesn't consider dependency ordering (~~so it's unsafe~~ [edit: probably safe]).

Jul 11 '24 12:07 jobh

part of me is getting the impression that we shouldnt have a new scope, instead we should enable a "function/item" to be run any number of time while rerunning its function scope setup entirely

Yeah, that would work. However: A new item scope would still be desirable, for performance reasons (although the alternative definition scope you mention may suffice, once it exists).

(this avoids issues like tmpdir setup/teardown and other confusions as it would all be about being able to run a item multiple times for different reasons (repeat, flakyness, different hypothesis datasets)

The tmp_path problem I saw was caused by teardown depending on reporting. But apart from that, yes.

however that type of rerunning is something that needs a change in the runtestprotocol, which is one of the harder changes of the internals

...ouch.

Jul 11 '24 13:07 jobh

The assessment on repeat is incorrect,as the repeats it does create new test ids which is problematic

For example setting repeat to 1000 on a Testsuite of 1000 tests ends up with 1000000 test ids for rerun failures in the pytest cache

Jul 11 '24 15:07 RonnyPfannschmidt

Just to leave my first impressions after reading this thread, my gut feeling aligns with @RonnyPfannschmidt: I would really like to avoid introducing a new user-facing scope, as it seems really complicated to explain/understand for the average user, because in order for it to make sense you need to first introduce the concept of "multi-executor", which is a bit niche and not something users encounter on a day to day basis.

I also agree that given this is something plugins deal with it, perhaps we can greatly simplify things by providing plugins with the ability of resetting the fixture caches -- we can introduce a new experimental method to Node to reset the fixture cache of the given fixtures passed by name:

class Node:

    def reset_fixture_cache(self, names: Iterable[str], recursive: bool=False) -> None:
        """
        EXPERIMENTAL

        Can be called by plugins in order to reset the internal cache of the given fixtures names,
        which will force their execution again the next time they are requested. Should be used
        with care by plugins which execute a test function multiple times.

        :param recursive:
            If True, also reset all fixtures of the same name (as fixtures might overwrite
            other fixtures with the same name in different locations). 
        """

I have not looked into the details, but I suspect this is trivial/simple to implement and would solve all the issues with plugins that need to execute the same test function more than once (hypothesis, pytest-repeat, etc).

Jul 14 '24 14:07 nicoddemus

Thanks for your suggestion @nicoddemus! I very much agree on the user-facing aspect of such a new scope, and the proposed documentation says only the following:

.. note:
    The ``item`` and ``function`` scopes are equivalent unless using an
    executor that runs the test function multiple times internally, such
    as ``@hypothesis.given(...)``. If unsure, use ``function``.

That said, your alternative proposal of resetting only specified fixtures would work without introducing new scope. Such a method would definitely be useful, and enough to implement basic support for proper function-scoped fixtures in hypothesis.

However, I worry that it would be more error prone, as it's hard to protect against accidentally depending on a resetting fixture:

@pytest.fixture
@register_as_non_resetting
def disable_cache(monkeypatch):
    # bug: should act on pytest.MonkeyPatch.context()
    monkeypatch.setattr(my_module, "cache", None)
    yield

@pytest.fixture
def deep_inject_testdata(testdata):
    monkeypatch.setattr(my_module, "data", testdata)
    yield

@given(x=...)
def test_something(disable_cache, deep_inject_testdata, x: MyType):
    ...  # the actual test

    # This would be automatic, but shown explicit here.
    # "disable_cache" is not listed, as it is non_resetting.
    node.reset_fixture_cache(["deep_inject_testdata"])

The problem here is that monkeypatch would be reset due to being in the closure of deep_inject_testdata. Better would be to disallow the disable_cache -> monkeypatch dependency. In practice, the (hypothetical) register_as_non_resetting will have to enforce this restriction, which looks hard to do without support of the scope mechanism. (just disallowing dependencies would be simple, but perhaps needlessly restrictive)

Jul 15 '24 09:07 jobh

The problem here is that monkeypatch would be reset due to being in the closure of deep_inject_testdata.

Why would this be a problem? Unless I'm missing something, resetting disable_cache would just mean that it would execute again for the next test function call (disabling the cache again in this case).

Jul 15 '24 12:07 nicoddemus

i suppose the fear is that reset has extra messy effects on hidden dependencies that will see effects from actions of fixtures that are dependent, but not expressed as such)

and the answer to that in this case of the proposed api is "well to bad"

Jul 15 '24 14:07 RonnyPfannschmidt

Ah I see.

Well yeah, but I suppose for plugins like Hypothesis, they would be responsible to go over all the "function"-scoped fixtures of a test and resetting them all.

While I understand this might not sound ideal, I think is still a step in the right direction because:

We enable the specific use cases we have in mind to work (hypothesis, pytest-repeat, etc).
We do not introduce a new user-level feature, the reset_fixture_cache would be marked as EXPERIMENTAL and only used by some plugins initially.
Simple to implement.

Introducing an entire new scope is much more risky because we would be releasing a user-facing feature directly, it would introduce much more code churn, documentation, etc.

Also one thing does not necessarily exclude the other: we can introduce the reset_fixture_cache function now, and LATER introduce a new scope, if we really want it.

Jul 16 '24 13:07 nicoddemus

Apologies for the late response, I've been travelling.

Thanks for constructive ideas, I'd be happy to try implementing @nicoddemus' proposed idea. But I also realize from your back-and-forth that I don't understand the details sufficiently yet, so there will likely be some iterations.

One thing in particular: After invalidating the relevant fixtures, what would be the right entry point for recreating them (without starting a new subtest or anything like that)? In the draft PR I did this explicitly (in the same order as initial creation), but I suspect this may not be enough here since the closure-to-reset may a subset of the closure of the test function - and hence some fixtures outside the scope may be torn down.

[edit: If there is no such entry point, that's fine too, just so that I don't go too far down the wrong trouser leg.]

Jul 16 '24 20:07 jobh

my current understanding is that normal item setup should recreate them as needed

Jul 16 '24 21:07 RonnyPfannschmidt

pytest pytest copied to clipboard

RFC. Support for resetting function-scoped fixtures

What's the problem this feature will solve?

Describe the solution you'd like

Additional context

pytest
pytest copied to clipboard