hypothesis icon indicating copy to clipboard operation
hypothesis copied to clipboard

st.lists rejects too much when generating large lists

Open JasonGross opened this issue 3 months ago • 2 comments

Consider this code:

from hypothesis import given, strategies as st, seed

@st.composite
def biglist(draw, strat):
  N = draw(st.integers(min_value=1, max_value=20000))
  N = min(8192, N)
  return draw(st.lists(strat, min_size=N, max_size=N))


@seed(195400164800693954998404088135293827994)
@given(biglist(st.floats(allow_nan=False, allow_infinity=False, width=32)))
def test(ls): pass 

test()

I see

FailedHealthCheck: It looks like this test is filtering out a lot of inputs. 4 inputs were generated successfully, while 50 inputs were filtered out. 

An input might be filtered out by calls to assume(), strategy.filter(...), or occasionally by Hypothesis internals.

Applying this much filtering makes input generation slow, since Hypothesis must discard inputs which are filtered out and try generating it again. It is also possible that applying this much filtering will distort the domain and/or distribution of the test, leaving your testing less rigorous than expected.

If you expect this many inputs to be filtered out during generation, you can disable this health check with @settings(suppress_health_check=[HealthCheck.filter_too_much]). See https://hypothesis.readthedocs.io/en/latest/reference/api.html#hypothesis.HealthCheck for details.

As I understand it, the issue is that when a single element rejects / has an unsatisfied assumption, the entire list up to this point is rolled back. This understanding is validated by the following workaround:

Details
from hypothesis import given, strategies as st, seed, HealthCheck, settings
from hypothesis.errors import UnsatisfiedAssumption

@st.composite
def biglist(draw, strat):
  N = draw(st.integers(min_value=1, max_value=20000))
  N = min(8192, N)
  return draw(st.lists(strat, min_size=N, max_size=N))

@st.composite
def retry(draw, strat, maxretries: int | None = 50):
  i = 0
  while maxretries is None or i <= maxretries:
    i += 1
    try:
      return draw(strat)
    except UnsatisfiedAssumption:
      pass
  raise UnsatisfiedAssumption

@seed(195400164800693954998404088135293827994)
@given(biglist(st.floats(allow_nan=False, allow_infinity=False, width=32)))
def test(ls): pass 

@settings(suppress_health_check=[HealthCheck.too_slow,HealthCheck.data_too_large])
@seed(195400164800693954998404088135293827994)
@given(biglist(retry(st.floats(allow_nan=False, allow_infinity=False, width=32))))
def test2(ls): pass 

test2()

I think this "retry" wrapper should be rolled into ListStrategy at https://github.com/HypothesisWorks/hypothesis/blob/c748cc74320bc338a18238a900ca5755bb0d920b/hypothesis-python/src/hypothesis/strategies/_internal/collections.py#L206

JasonGross avatar Sep 19 '25 01:09 JasonGross

Your understanding is basically correct, with the addition that the base-level filtering in this example comes from width=32 for floats:

    if width < 64:

        def downcast(x: float) -> float:
            try:
                return float_of(x, width)
            except OverflowError:  # pragma: no cover
                reject()

        result = result.map(downcast)

and that MappedStrategy.do_draw already retries 3 times while drawing to satisfy any filters. So

when a single element has an unsatisfied assumption, the entire [input] is rolled back

is better stated as

when a single element fails an assumption 3 times in a row, the entire [input] is rolled back

My hesitance to adding this retry behavior to more layers is we quickly end up in exponential territory. For example, adding another 3-retry t ListsStrategy would result in 3 * 3 = 9 retries before giving up on the input entirely, as it combines with the .map retries.

In general, I'm in favor of increasing our tolerance for retries. Sometimes we really have ended up in a bad part of the search space, and giving up on the input is quicker, but I don't think that's the common case.

Finally, this particular case would be fixed by https://github.com/HypothesisWorks/hypothesis/issues/3959, by making float widths a part of the underlying choice sequence primitive and therefore guaranteed to complete.

Liam-DeVoe avatar Sep 19 '25 01:09 Liam-DeVoe

Ooh, thanks for the explanations!

My hesitance to adding this retry behavior to more layers is we quickly end up in exponential territory. For example, adding another 3-retry t ListsStrategy would result in 3 * 3 = 9 retries before giving up on the input entirely, as it combines with the .map retries.

I think this is the wrong way to think about retries in ListStrategy in most cases. In most cases, the element strategy should not be coupled to the length of the list. If we're going to give up on the input entirely, it should be based on the failure rate of sampling elements (let's call it f), not the failure rate of sampling lists of length L (which would be 1 - (1 - f) ** L).

But I agree that we should not try 50 times (or exponentially many times) to sample the impossible. We can avoid this by sampling the first element (or first couple of elements) unconditionally, and the rest of the elements in a retry loop. Or, if we want something more nuanced, we can track a cumulative failure rate and error if the failure rate is ever too high.

JasonGross avatar Sep 19 '25 02:09 JasonGross