hypothesis [experimental] Run crosshair in CI

See https://github.com/HypothesisWorks/hypothesis/issues/3914

To reproduce this locally, you can run make check-crosshair-cover/nocover/niche for the same command as in CI, but I'd recommend pytest --hypothesis-profile=crosshair hypothesis-python/tests/{cover,nocover,datetime} -m xf_crosshair --runxfail to select and run only the xfailed tests.

Hypothesis' problems

Vast majority of failures are Flaky: Inconsistent results from replaying a failing test... - mostly backend-specific failures; we've both
- [x] improved reporting in this case to show the crosshair-specific traceback
- [x] got most of the affected tests passing
[x] Invalid internal boolean probability, e.g. "hypothesis/internal/conjecture/data.py", line 2277, in draw_boolean assert p > 2 ** (-64), fixed in 1f845e0 (#4049)
[x] many of our test helpers involved nested use of @given, fixed in https://github.com/HypothesisWorks/hypothesis/commit/3315be63163218f5b4027128e80a2b856b512fcc
symbolic outside context
- [x] due to charmap, fixed in https://github.com/HypothesisWorks/hypothesis/commit/48e89a6a4f920be01c6163e986dd0051541a5ac4
- [x] due to target(), fixed in 85712ad (#4049)
[x] avoid uninstalling typing_extensions when crosshair depends on it
[x] tests which are not really expected to pass on other backends. I'm slowly applying a backend-specific xfail decorator to them, @xfail_on_crosshair(...).
- [x] tests which expect to raise a healthcheck, and fail because our crosshair profile disables healthchecks. Disable only .too_slow and .filter_too_much, and skip remaining affected tests under crosshair.
- [x] undo some over-broad skips, e.g. various xfail decorators, pytestmarks, -k 'not decimal' once we're closer
[x] provide a special exception type for when running the test or realizing values would hit a PathTimeout; see https://github.com/pschanely/hypothesis-crosshair/issues/21 and https://github.com/HypothesisWorks/hypothesis/issues/3914#issuecomment-2277023708
- [x] and something to signal that we've exhausted Crosshair's ability to explore the test. If this is sound, we've verified the function and can stop! (and should record that in the stop_reason). If unsound, we can continue testing with Hypothesis' default backend - so it's important to distinguish. https://github.com/HypothesisWorks/hypothesis/pull/4092

Probably Crosshair's problems

[x] Repeated-registration error, see https://github.com/pschanely/hypothesis-crosshair/issues/17
[x] RecursionError, see https://github.com/pschanely/CrossHair/issues/294
[x] unsupported operand type(s) for -: 'float' and 'SymbolicFloat' in test_float_clamper
[x] TypeError: descriptor 'keys' for 'dict' objects doesn't apply to a 'ShellMutableMap' object (or 'values' or 'items'). Fixed in https://github.com/pschanely/CrossHair/pull/269
[x] TypeError: _int() got an unexpected keyword argument 'base'
[x] Buffer not realized for hash function, fixed in https://github.com/pschanely/CrossHair/issues/272
[x] Internal error for case-insensitive regex https://github.com/pschanely/CrossHair/issues/274
[x] typing.get_type_hints() raises ValueError, see https://github.com/pschanely/CrossHair/issues/275
[x] json round-trip error below
[x] TypeError in bytes regex, see https://github.com/pschanely/CrossHair/issues/276
[x] Invalid args to provider.draw_boolean() inside FeatureStrategy, see https://github.com/pschanely/hypothesis-crosshair/issues/18
[x] Support dict(name=value), see https://github.com/pschanely/CrossHair/issues/279
[x] Error in PurePath constructor, see https://github.com/pschanely/CrossHair/issues/280
[x] zlib.compress() not symbolic, see https://github.com/pschanely/CrossHair/issues/286
[x] int.from_bytes(map(...), ...), see https://github.com/pschanely/CrossHair/issues/291
[x] base64 support, see https://github.com/pschanely/CrossHair/issues/293
[ ] TypeError: conversion from SymbolicInt to Decimal is not supported; see also snan below
[x] TypeVar problem, see https://github.com/pschanely/CrossHair/issues/292
[ ] Crash on way-too-large integer, see https://github.com/pschanely/CrossHair/issues/285
[x] RecursionError inside Lark, see https://github.com/pschanely/CrossHair/issues/297
[ ] https://github.com/pschanely/CrossHair/issues/307

Error in `operator.eq(Decimal('sNaN'), an_int)`

____ test_rewriting_does_not_compare_decimal_snan ____
  File "hypothesis/strategies/_internal/strategies.py", line 1017, in do_filtered_draw
    if self.condition(value):
TypeError: argument must be an integer
while generating 's' from integers(min_value=1, max_value=5).filter(functools.partial(eq, Decimal('sNaN')))

Cases where crosshair doesn't find a failing example but Hypothesis does

Seems fine, there are plenty of cases in the other direction. Tracked with @xfail_on_crosshair(Why.undiscovered) in case we want to dig in later.

Nested use of the Hypothesis engine (e.g. given-inside-given)

This is just explicitly unsupported for now. Hypothesis should probably offer some way for backends to declare that they don't support this, and then raise a helpful error message if you try anyway.

Jul 07 '24 18:07 Zac-HD

Here's a diff for a few of the niche failures:

diff --git a/hypothesis-python/tests/cover/test_testdecorators.py b/hypothesis-python/tests/cover/test_testdecorators.py
index 0cb9cd3c2..30af2e2e8 100644
--- a/hypothesis-python/tests/cover/test_testdecorators.py
+++ b/hypothesis-python/tests/cover/test_testdecorators.py
@@ -13,6 +13,7 @@ import threading
 from collections import namedtuple
 
 from hypothesis import HealthCheck, Verbosity, assume, given, note, reporting, settings
+from hypothesis.internal.conjecture.data import realize
 from hypothesis.strategies import (
     binary,
     booleans,
@@ -311,7 +312,7 @@ def test_can_derandomize():
     @given(integers())
     @settings(derandomize=True, database=None)
     def test_blah(x):
-        values.append(x)
+        values.append(realize(x))
         assert x > 0
 
     test_blah()
@@ -479,7 +480,6 @@ def test_empty_lists(xs):
 
 
 def test_given_usable_inline_on_lambdas():
-    xs = []
-    given(booleans())(lambda x: xs.append(x))()
-    assert len(xs) == 2
-    assert set(xs) == {False, True}
+    xs = set()
+    given(booleans())(lambda x: xs.add(realize(x)))()
+    assert xs == {False, True}

[ ] tests/cover/test_testdecorators.py::test_can_find_large_sum_frozenset looks like potentially a crosshair weakness, but I don't know if the ir allows it to reason effectively at the set level. (we can just skipif(CROSSHAIR) if not).
[ ] tests/cover/test_testdecorators.py::TestCases::test_float_addition_cancels: unsure. doesn't reproduce locally.
[ ] tests/cover/test_testdecorators.py::test_when_set_to_no_simplifies_runs_failing_example_twice is probably hypothesis converting backend ir to a buffer.

Not sure about the many flaky recursion errors on cover and nocover. I saw this behavior locally too, where tests are fine when run in isolation, and the first n tests are also green, but at some point a flip switches and many tests start failing with this error. I wonder if crosshair has some persistent state incrementing across test runs?

Jul 07 '24 19:07 Liam-DeVoe

test_given_usable_inline_on_lambdas is basically a failure of deduplication, where Hypothesis will run only two inputs through a test that accepts a single boolean argument. Unclear whether this matters for Crosshair; noticing that you've exhausted some state-space is a cute trick for easy problems.

I sketched out a nice system for xfailing tests under crosshair, where you can also -m xf_crosshair --runxfail to see all the failures live... and then looked at the far more numerous cover and nocover failures. Well, pushing it anyway...

Jul 07 '24 20:07 Zac-HD

@Zac-HD your triage above is SO great. I am investigating.

Jul 07 '24 21:07 pschanely

Knocked out a few of these in 0.0.60. I think that means current status on my end is:

[ ] TypeError: conversion from SymbolicInt to Decimal is not supported
[X] Unsupported operand type(s) for -: 'float' and 'SymbolicFloat' in test_float_clamper
[X] TypeError: descriptor 'keys' for 'dict' objects doesn't apply to a 'ShellMutableMap' object (or 'values' or 'items').
[X] TypeError: _int() got an unexpected keyword argument 'base'
[ ] Symbolic not realized (in e.g. test_suppressing_filtering_health_check)
[ ] Error in operator.eq(Decimal('sNaN'), an_int)
[ ] Zac's cursed example below!

More soon.

Jul 08 '24 17:07 pschanely

Ah - the Flaky failures are of course because we had some failure under the Crosshair backend, which did not reproduce under the Hypothesis backend. This is presumably going to point to a range of integration bugs, but is also something that we'll want to clearly explain to users because integration bugs are definitely going to happen in future and users will need to respond (by e.g. using a different backend, ignoring the problem, whatever).

[x] improve the reporting around Flaky failures where the differing or missing errors are related to a change of backend while shrinking. See also https://github.com/HypothesisWorks/hypothesis/issues/4040.
[x] triage all the current failures so we can fix them

Jul 12 '24 07:07 Zac-HD

OK, here's a cursed one:

import sys

from hypothesis import given, settings, strategies as st
from hypothesis.internal import charmap as cm

@settings(backend="crosshair")
@given(
    st.sets(st.sampled_from(cm.categories())) | st.none(),
    st.integers(0, sys.maxunicode),
    st.integers(0, sys.maxunicode),
)
def test_a(cats, m1, m2):
    m1, m2 = sorted((m1, m2))
    cm.query(categories=cats, min_codepoint=m1, max_codepoint=m2)

# test_a()

@settings(backend="crosshair")
@given(
    st.integers(0, sys.maxunicode),
    st.integers(0, sys.maxunicode),
)
def test_b(m1, m2):
    m1, m2 = sorted((m1, m2))
    cm.query(min_codepoint=m1, max_codepoint=m2)

test_b()

running python demo.py raises HypothesisException: expected <class 'int'> from CrossHairPrimitiveProvider.realize, got <class 'crosshair.libimpl.builtinslib.SymbolicInt'>, so it seems that the realize() method isn't working.

But! If I comment out test_a - which doesn't run - then I instead get CrosshairInternal: Numeric operation on symbolic while not tracing.

Jul 12 '24 07:07 Zac-HD

Most/all of the "expected x, got symbolic" errors are symptoms of an underlying error in my experience (often operation on symbolic while not tracing). In this case running with export HYPOTHESIS_NO_TRACEBACK_TRIM=1 reveals limited_category_index_cache in cm.query is at fault.

Jul 12 '24 15:07 Liam-DeVoe

ah-ha, seems like we might want some https://github.com/HypothesisWorks/hypothesis/pull/4029/ - style 'don't cache on backends with avoid_realize=True' logic.

Jul 12 '24 18:07 Zac-HD

Still here and excited about this! I am on a detour of doing a real symbolic implementation of the decimal module - should get that out this weekend.

Jul 13 '24 00:07 pschanely

Triaging a pile of the Flaky erorrs, most were due to getting a RecursionError under crosshair and then passing under Hypothesis - and it looks like most of those were in turn because of all our nested-@given() test helpers.

So I've tried de-nesting those, which seems to work nicely and even makes things a bit faster by default; and when CI finishes we'll see how much it helps on crosshair 🤞

Jul 13 '24 08:07 Zac-HD

Looks like string-encoding wants to receive exactly a str, meaning it crashes under crosshair:

from encodings.aliases import aliases
from hypothesis import Verbosity, given, settings, strategies as st

def _enc(cdc):
    try:
        "".encode(cdc)
        return True
    except Exception:
        return False

lots_of_encodings = sorted(x for x in set(aliases).union(aliases.values()) if _enc(x))
assert len(lots_of_encodings) > 100  # sanity-check

@settings(backend="crosshair", verbosity=Verbosity.verbose)
@given(st.text(), st.sampled_from(lots_of_encodings))
def test_b(string, codec_name):
    string.encode(codec_name)

representative traceback:

Trying example: test_b(
    string='',
)
Traceback (most recent call last):
  File ".../demo.py", line 14, in test_b
    string.encode("037")
  File ".venv/lib/python3.10/site-packages/crosshair/libimpl/builtinslib.py", line 2691, in encode
    return codecs.encode(self, encoding, errors)
  File ".venv/lib/python3.10/site-packages/crosshair/libimpl/codecslib.py", line 19, in _encode
    (out, _len_consumed) = _getencoder(encoding)(obj, errors)
  File ".../3.10.11/lib/python3.10/encodings/cp037.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
TypeError: charmap_encode() argument 1 must be str, not LazyIntSymbolicStr

...
hypothesis.errors.Flaky: Inconsistent results from replaying a failing test case!
  last: INTERESTING from TypeError at .../3.10.11/lib/python3.10/encodings/cp037.py:12
  this: VALID

Jul 13 '24 17:07 Zac-HD

It's hard to believe that it's only been a week since I opened this test pr - it's already led to multiple releases of both Crosshair and Hypothesis, as well as a pile of other cleanups to our tests!

Having gone on such a spree of test fixes, random triage, and trying to isolate nice reproducers, I think I'm going to put this down for a while and focus on other things, but @pschanely feel free to ping me or just open your own copy whenever it'd be helpful to see a fresh run with the latest Crosshair updates 🙂

Jul 15 '24 07:07 Zac-HD

(ok, got a bit nerdsniped...) Digging into this CI run, as the latest and closest-to-passing such we've seen:

check-crosshair-custom-cover/test_[a-d]*
- times out at 93% (six hours); 9 / 744 tests fail
- actions: try verbose mode or bisection to identify which tests hang?
check-crosshair-custom-cover/test_[e-i]*
- 10m7s, 61 / 606 tests fail
- actions: waiting for fixes above, then rerun and triage
check-crosshair-custom-cover/test_[j-r]*
- times out at 94% (six hours); dozens / 1040 tests fail
- actions: verbose or bisection as above
check-crosshair-custom-cover/test_[s-z]*
- pytest internal error within ~20s, ?? / 689 tests fail
- actions: see traceback below and work out where to insert the .realize(some_report) call
check-crosshair-nocover
- 2h 48m 32s (slow!), 99 / 521 tests fail
- actions:
  - consider splitting this too, for speed. but it was ~30 minutes before, why??
  - reported https://github.com/pschanely/hypothesis-crosshair/issues/18
  - waiting for fixes above, then rerun and triage
check-crosshair-niche
- 12m19s, 2 / ??? tests fail
- failing tests are xpass on Lark; there's still the array-api and numpy tests to go
- actions: set strict=False for those tests, and run array-api and numpy together for efficiency

# pytest internal error on `check-crosshair-custom-cover/test_[s-z]*`
  File "_hypothesis_pytestplugin.py", line 329, in pytest_runtest_makereport
    ("Hypothesis", "\n".join(item.hypothesis_report_information))
TypeError: sequence item 1: expected str instance, LazyIntSymbolicStr found

Jul 16 '24 01:07 Zac-HD

Got a good traceback for the RecursionError under crosshair 0.0.61:

Traceback (most recent call last):
  File ".../crosshair/libimpl/builtinslib.py", line 4271, in _dict
    if isinstance(arg, dict):
  File ".../crosshair/libimpl/builtinslib.py", line 4461, in _isinstance
    return _issubclass(type(obj), types)
  File ".../crosshair/libimpl/builtinslib.py", line 4457, in _issubclass
    return issubclass(subclass, superclass)
  File ".../crosshair/libimpl/builtinslib.py", line 4457, in _issubclass
    return issubclass(subclass, superclass)
  File ".../crosshair/libimpl/builtinslib.py", line 4457, in _issubclass
    return issubclass(subclass, superclass)
  [Previous line repeated 1980 more times]
  File ".../crosshair/libimpl/builtinslib.py", line 4432, in _issubclass
    with NoTracing():
  File ".../crosshair/tracers.py", line 431, in NoTracing
    return TraceSwap(COMPOSITE_TRACER.ctracer, True)
  File ".../crosshair/tracers.py", line 160, in __call__
    return self.trace_op(frame, codeobj, opcodenum)
  File ".../crosshair/tracers.py", line 163, in trace_op
    if is_tracing():
  File ".../crosshair/tracers.py", line 427, in is_tracing
    return COMPOSITE_TRACER.ctracer.enabled()
RecursionError: maximum recursion depth exceeded while calling a Python object

I don't know what the respective subclass, superclass is but that sure seems like it could do with some loop-detection :-)

Jul 19 '24 18:07 Zac-HD

Got a good traceback for the RecursionError under crosshair 0.0.61: I don't know what the respective subclass, superclass is but that sure seems like it could do with some loop-detection :-)

Interesting! Can you point me at the hypothesis test that does this? Annnnd, yeah, so the interception framework is supposed to not interfere when a patch calls the function it's patching, but that's obviously not working in this case!

Jul 19 '24 20:07 pschanely

this CI job has 77/92 failures as recursion errors; test_interval_intersection is the one at the end of the logs

👋 just going to dump some other thoughts here quickly, please excuse brevity - more to come on the weekend

from this CI job it looks like we might have some hangs or very slow progress in tests/cover/test_deferred_strategies.py; will investigate
- just slow: 9316.72s in test_deferred_strategies.py::test_mutual_recursion... but that's very slow
we seem to have a lot of regex-related failures (I'm so sorry)
this job has test_can_generate_from_all_registered_types failing for UnicodeEncodeError, UnicodeTranslateError, Fraction, IPv4Interface, IPv4Network, IPv6Interface, IPv6Network, Rational, PathLike, Match, and slice.
lots of failing tests for numpy integration, see logs here

I think the regex-related stuff and LazyIntSymbolicStr are probably the next-highest impact things to fix.

Jul 19 '24 20:07 Zac-HD

I think the regex-related stuff and LazyIntSymbolicStr are probably the next-highest impact things to fix.

OK! I'm hoping plugin version 0.0.9 will fix most of the LazyIntSymbolicStr errors. Can look into regexes during the week!

Jul 22 '24 01:07 pschanely

@pschanely I'm overall seeing fewer failing tests (🎉🎉🎉), but also this run just segfaulted maybe in _crosshair_tracers.

Also it might be nice to keep a changelog for hypothesis-crosshair 🙂

Jul 22 '24 02:07 Zac-HD

@pschanely I'm overall seeing fewer failing tests (🎉🎉🎉), but also this run just segfaulted maybe in _crosshair_tracers.

Heh. I'm reasonably confident that CrossHair is not thread safe. This also doesn't repro for me immediately, but I'll play around with it.

Also it might be nice to keep a changelog for hypothesis-crosshair 🙂

Haha, yes, it's time.

Jul 22 '24 16:07 pschanely

from test_assume_has_status_reason:

Traceback (most recent call last):
  File ".../crosshair/libimpl/builtinslib.py", line 900, in __abs__
    return self._unary_op(lambda v: z3.If(v < 0, -v, v))
  File ".../crosshair/libimpl/builtinslib.py", line 319, in _unary_op
    return self.__class__(op(self.var), self.python_type)
  File ".../crosshair/libimpl/builtinslib.py", line 900, in <lambda>
    return self._unary_op(lambda v: z3.If(v < 0, -v, v))
TypeError: '<' not supported between instances of 'BoolRef' and 'int'

whereas in Python issubclass(bool, int) and so you can indeed compare them.

Jul 22 '24 22:07 Zac-HD

Alright - with the fantastic new stuff in crosshair==0.0.64, there are few enough failing tests that I've marked practically all of them as expected failures!

You can see why they fail with pytest --hypothesis-profile=crosshair hypothesis-python/tests/cover/ -m xf_crosshair --runxfail (set profile, collect tests, select those with the xf_crosshair marker, ignore xfail marker)
The most-common reasons to seem to be: (1) returning a symbolic type from provider.realize(obj) (which is an error, see latest CI), and (2) a RecursionError inside Crosshair - there's a representative traceback above I've noted those with Why.not_realized and Why.recursionerror respectively at least for most cases to make finding them a bit easier; however some are probably missing because there's a fair bit of nondeterminism involved.
I've opened a couple more issues where there was something obvious to report. I expect that re-triaging the marked tests will continue to yield new issues for quite a while though - and there are also some pretty widely-scoped skips that I put in place for e.g. decimals and numpy-related issues.

This is feeling like incredible progress overall though; imo we've gone from "neat prototype" to "usable alpha/early-beta" over the last few weeks 🤩

Jul 24 '24 06:07 Zac-HD

most of the currently failing tests look like they might be crosshair issues, cc @pschanely:

and I'll skip the database test - crosshair just finishes exploring sooner than that test expects 😁

Sep 17 '24 07:09 Zac-HD

@pschanely huge progress from recent updates! The BackendCannotProceed mechanism entirely fixed several classes of issues, the floats changes have been great (signed zero ftw!), from_type() generates instances more often, I'm no longer skipping categories of stuff, and overall we've dropped from about +350 to +250 lines of code in this PR 🎊

At this point my only real reason to avoid merging is that crosshair updates often cause a fair bit of churn, causing some tests to start failing and some to start xpassing - it's net-good, but would be toil in our CI. I feel like we've crossed from an alpha-version which is a neat proof of concept, to a beta-version which is still early but already both useful and clearly on a path to stability and wider adoption. Incredibly excited about this ✨

If you want to pull out Crosshair issues,

this PR is probably useful as a pre-release test, to check whether there are any regressions you didn't expect
there's a commit marking some things that look like Crosshair bugs to me, and many more where Crosshair just doesn't find a failure that Hypothesis does (within the test budget, and which might or might not be a problem)
there's a commit full of tests skipped because they were very slow, if you want to look at performance issues. I haven't audited it lately but would guess at least a third are still slow + also Crosshair's problem.
the last big commit is pretty messy, probably best to ignore that for now

Oct 10 '24 19:10 Zac-HD

@pschanely huge progress from recent updates! The BackendCannotProceed mechanism entirely fixed several classes of issues, the floats changes have been great (signed zero ftw!), from_type() generates instances more often, I'm no longer skipping categories of stuff, and overall we've dropped from about +350 to +250 lines of code in this PR 🎊

So great.

At this point my only real reason to avoid merging is that crosshair updates often cause a fair bit of churn, causing some tests to start failing and some to start xpassing - it's net-good, but would be toil in our CI.

Frankly, I'm not sure it makes sense to block hypothesis on a crosshair-related failure, even in a very distant, stable future. Would love your ideas making the integration more "eventually" correct. Maybe a dedicated testing repo that pulls the hypothesis source and has these pytest markers externally applied? (or submodules? but those scare me)

If you want to pull out Crosshair issues,

Always. Thanks for the commit breakdown. More updates soon!

Oct 11 '24 12:10 pschanely

Frankly, I'm not sure it makes sense to block hypothesis on a crosshair-related failure, even in a very distant, stable future. Would love your ideas making the integration more "eventually" correct. Maybe a dedicated testing repo that pulls the hypothesis source and has these pytest markers externally applied? (or submodules? but those scare me)

For clarity, "blocking" would mean 'when we update our pinned dependencies, if Crosshair has changed we'll update the xfail markers accordingly and report any issues upstream, or maybe add a != requirement for that version'. Similarly, if a Hypothesis PR doesn't work with Crosshair I'd prefer to learn that at the time so I can decide to either xfail the tests, or do some extra work to support it - and my guess is that the converse would be useful for you too.

In practice I expect I'll just keep updating this PR for now, and you can grab a local copy of the branch if you want to run the tests before a Crosshair release 😁 (and note the test-selection tips at the top of the pr!)

Oct 12 '24 00:10 Zac-HD

For clarity, "blocking" would mean 'when we update our pinned dependencies, if Crosshair has changed we'll update the xfail markers accordingly and report any issues upstream, or maybe add a != requirement for that version'. Similarly, if a Hypothesis PR doesn't work with Crosshair I'd prefer to learn that at the time so I can decide to either xfail the tests, or do some extra work to support it - and my guess is that the converse would be useful for you too.

Fair enough! I was concerned about how much churn in CrossHair pass/fails you'll see for unrelated hypothesis changes, but it's also true that I want to know about what you see. Current plan SGTM.

In practice I expect I'll just keep updating this PR for now, and you can grab a local copy of the branch if you want to run the tests before a Crosshair release 😁 (and note the test-selection tips at the top of the pr!)

Yup! I've been doing this a little already; works for me.

Oct 12 '24 19:10 pschanely

@Zac-HD I've been looking into getting this rebased against master, and I think there are at least some mainline changes that are affecting the tests. I am able to do some early triage, but hoping that you or @tybug can assist with the resolution. Would that be ok? And, do we want to work through things here? Alternatively, I guess I could be opening actual hypothesis issues saying "hey, I think this test X should work under the crosshair profile and here's why..."

Jan 08 '25 14:01 pschanely

Confirmed that we regressed crosshair at some point:

@given(st.floats(min_value=0))
@settings(backend="crosshair")
def f(xs):
    pass
f()

...
  File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/internal/conjecture/engine.py", line 1540, in cached_test_function
    result = check_result(data.as_result())
                          ^^^^^^^^^^^^^^^^
  File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/internal/conjecture/data.py", line 2370, in as_result
    assert self.frozen
           ^^^^^^^^^^^
AssertionError

Will investigate (but not sure I'll have time today specifically). I think this is almost certainly our fault, not crosshair.

IMO initial triage in here is best, with the intent to only open an issue if we expect a fix to take longer than ~days.

Jan 08 '25 16:01 Liam-DeVoe

https://github.com/HypothesisWorks/hypothesis/pull/4230 will fix the above issue!

Jan 08 '25 18:01 Liam-DeVoe

I'm now leaning towards merging this onto master - we've got it almost-entirely-working, and have (I think correctly) used the word "regression" to describe changes which made it work less well. So having CI to let us know about those as they happen seems pretty valuable to me!

(though who knows when I'll have a day free to get this back up to date again 😅)

Jan 08 '25 23:01 Zac-HD

[experimental] Run crosshair in CI

Hypothesis' problems

Probably Crosshair's problems

Error in operator.eq(Decimal('sNaN'), an_int)

Cases where crosshair doesn't find a failing example but Hypothesis does

Nested use of the Hypothesis engine (e.g. given-inside-given)

Error in `operator.eq(Decimal('sNaN'), an_int)`