hypothesis Improve testing story for Python 3.14 and free-threading builds

https://github.com/HypothesisWorks/hypothesis/pull/4025 got some initial wins - like "running 3.14 in CI" and "ensuring that our auto-updates set up free-threading environments for us", but there's plenty more to do

[x] turn on CI for the free-threading builds (e.g. 3.13.0t-dev) and see what breaks. Might be blocked on upstream pydata packages for extensions? We also don't guarantee thread-safety at the moment, so I expect some chaos here and we might just turn the tests off again for a while.
[ ] search for FIXME-py314 and... fix them
- [ ] with the removal of typing.Bytestring, we no longer generate bytes instances from Sequence[int]; I think we probably should still do this
- [ ] test_suggests_elements_instead_of_annotations is failing, so it seems something about dataclass introspection changed
- [ ] some attrs introspection tests in https://github.com/HypothesisWorks/hypothesis/pull/4069
[ ] fix any other 3.14-specific issues that come up if or when they arise

Jul 04 '24 18:07 Zac-HD

@Zac-HD I've been testing Hypothesis against the SciPy testsuite using the pytest-run-parallel (https://github.com/Quansight-Labs/pytest-run-parallel) plugin on 313t , which enables any test to be run concurrently using N threads.

Specifically, I've ran into issues originating from Hypothesis, which appear from time to time when using multiple threads to run this test:

            gc.collect()
            if not gc.get_referrers(r):
                if sys.getrefcount(r) <= _PLATFORM_REF_COUNT:
>                   raise ReferenceError(
                        f"`register_random` was passed `r={r}` which will be "
                        "garbage collected immediately after `register_random` creates a "
                        "weakref to it. This will prevent Hypothesis from managing this "
                        "PRNG. See the docs for `register_random` for more "
                        "details."
E                       ReferenceError: `register_random` was passed `r=<random.Random object at 0x4305a3f0a10>` which will be garbage collected immediately after `register_random` creates a weakref to it. This will prevent Hypothesis from managing this PRNG. See the docs for `register_random` for more details.

r          = <random.Random object at 0x4305a3f0a10>

../../../../.pyenv/versions/3.13.0rc1t/lib/python3.13t/site-packages/hypothesis/internal/entropy.py:123: ReferenceError

 def process_arguments_to_given(wrapped_test, arguments, kwargs, given_kwargs, params):
        selfy = None
        arguments, kwargs = convert_positional_arguments(wrapped_test, arguments, kwargs)
    
        # If the test function is a method of some kind, the bound object
        # will be the first named argument if there are any, otherwise the
        # first vararg (if any).
>       posargs = [p.name for p in params.values() if p.kind is p.POSITIONAL_OR_KEYWORD]
E       RecursionError: maximum recursion depth exceeded

arguments  = ()
given_kwargs = {'data': data(), 'dtype': sampled_from((<class 'numpy.float32'>, <class 'numpy.float64'>)), 'n_arrays': integers(min_value=1, max_value=3), 'p': floats(min_value=0, max_value=1), ...}
kwargs     = {'self': <scipy._lib.tests.test__util.TestLazywhere object at 0x4a0c3cc21d0>, 'xp': <module 'numpy' from '/home/andfoy/.pyenv/versions/3.13.0rc1t/lib/python3.13t/site-packages/numpy/__init__.py'>}
params     = mappingproxy(OrderedDict({'self': <Parameter "self">, 'xp': <Parameter "xp">}))
selfy      = None
wrapped_test = <function accept.<locals>.test_basic at 0x4a0c25f87a0>

../../../../.pyenv/versions/3.13.0rc1t/lib/python3.13t/site-packages/hypothesis/core.py:660: RecursionError

Other times, the test passes successfully, which suggests that some kind of race condition or non-isolation of parameters is occurring when running Hypothesis tests under parallel loads in the new free-threaded CPython. I wanted to ask you for any helpful pointers to help the project solve this kind of issues

Aug 26 '24 21:08 andfoy

@andfoy

The first one is probably a race here, or in a similar section,

https://github.com/HypothesisWorks/hypothesis/blob/e339c5fc24e39bf476fe9586f610e5a7f91062aa/hypothesis-python/src/hypothesis/internal/entropy.py#L195-L197

where a second thread reassigns _hypothesis_global_random (and hence unreferences the first thread's random instance) before the first thread is finished registering it.

Fixable (if my guess is correct), but I suspect there may be many such cases and some of them may fail in less obvious ways...

Aug 27 '24 08:08 jobh

@jobh, thanks for the explanation! I've just opened a PR (https://github.com/HypothesisWorks/hypothesis/pull/4094) that addresses this issue, which in turn enables the aforementioned test to be run successfully.

Aug 27 '24 17:08 andfoy

3.14 is looking decent in this build, but I'm leaving disabled for now because the alphas do tend to break stuff pretty regularly.

Jan 25 '25 22:01 Zac-HD

I'm planning to make it so pytest-run-parallel auto-detects use of hypothesis.given in a test and automatically excludes it from running under multiple threads. I'm curious if there are other functions I could use to blacklist tests that are using hypothesis. It works using static AST parsing on the tests. We also added an API for libraries to declare functions and decorators as thread-unsafe, see: https://github.com/Quansight-Labs/pytest-run-parallel/pull/37.

In a hypothetical future where hypothesis has better support for using a thread pool to generate test cases, it would be neat if hypothesis had its own ability to spawn threads and run multithreaded stress tests to catch stuff pytest-run-parallel is missing because a test happens to use hypothesis.

Apr 08 '25 16:04 ngoldbaum

auto-detects use of hypothesis.given

It looks like checking for __hypothesistracebackhide__ is also sufficient.

Apr 08 '25 17:04 ngoldbaum

For pytest specifically, Hypothesis applies a @pytest.mark.hypothesis mark to all hypothesis tests. More generally, you can use hypothesis.internal.detection.is_hypothesis_test.

__hypothesistracebackhide__ is an implementation detail, so I wouldn't recommend checking that.

Apr 08 '25 17:04 Liam-DeVoe

More generally, you can use hypothesis.internal.detection.is_hypothesis_test.

Thanks, this works!

It's fair game to use this even though it's in the hypothesis.internal namespace?

Apr 09 '25 14:04 ngoldbaum

I'm pretty happy to document is_hypothesis_test as semipublic (not be subject to our usual 6mo deprecation policy etc, but we will try hard not to break it). This is a very reasonable thing for third party packages to check and we should have some recommended mechanism for it. I'll defer to @Zac-HD, though.

Apr 09 '25 14:04 Liam-DeVoe

I'd be inclined to re-export it as hypothesis.is_hypothesis_test, make sure it's got a good docstring + types, and document it on the integrations API reference. There are clearly some solid downstream usecases for asking "is this function a Hypothesis test", and it'd be good to provide a stable way to do that.

Apr 09 '25 17:04 Zac-HD

We've now made hypothesis.is_hypothesis_test public: https://github.com/HypothesisWorks/hypothesis/pull/4354

Apr 10 '25 15:04 Liam-DeVoe