hypothesis
hypothesis copied to clipboard
RuleBasedStateMachine is prone to Unsatisfiable errors
A state machine frequently ends up as hypothesis.errors.Unsatisfiable
when the input strategies to its rules themselves are frequently marked as invalid.
For example,
class MyStateMachine(RuleBasedStateMachine):
@rule(data=st.lists(st.text(), min_size=5, unique=True))
def rule1(self, data):
assert data is not None
TestMyStateMachine = MyStateMachine.TestCase
Yields:
E hypothesis.errors.Unsatisfiable: Unable to satisfy assumptions of run_state_machine
venv/lib/python3.10/site-packages/hypothesis/stateful.py:112: Unsatisfiable
--------------------------------------------------------------------------------------------------------- Hypothesis ---------------------------------------------------------------------------------------------------------
You can add @seed(187837441642656874040035655188699191288) to this test or run pytest with --hypothesis-seed=187837441642656874040035655188699191288 to reproduce this failure.
=================================================================================================== Hypothesis Statistics ====================================================================================================
myproj/test_hypothesis.py::TestMyStateMachine::runTest:
- during generate phase (31.27 seconds):
- Typical runtimes: ~ 0-55 ms, of which < 1ms in data generation
- 0 passing examples, 0 failing examples, 1000 invalid examples
- Events:
* 56.80%, Retried draw from text().filter(not_yet_in_unique_list) to satisfy filter
* 43.20%, Aborted test because unable to satisfy just(Rule(targets=(), function=rule1, arguments={'data': lists(text(), min_size=5, unique=True)}, preconditions=(), bundles=())).filter(RuleStrategy(machine=MyStateMachine({...})).is_valid).filter(lambda r: <unknown>)
- Stopped because settings.max_examples=100, but < 10% of examples satisfied assumptions
This is the minimal example I could find, my actual state machine is much larger but exhibits the same error.
The state machine works fine when used with simpler or more reliable strategy. '
The st.lists(st.text(), min_size=5, unique=True)
strategy also fine when used with @given
(i.e. not in a state machine), although the stats show that it does frequently return invalid examples.
@Zac-HD I see why you tagged this performance
but I do want to note that this is a pretty big obstacle to us being able to use state machines correctly.
In order to work around this issue, we need to make sure that all our composite strategies rarely or never mark examples as invalid; i.e, we basically cannot use assume
or filter
, or any of the built-in strategies that leverage filtering.
Specifically, we have to do the opposite of what the Hypothesis docs recommend for composite strategies: https://hypothesis.readthedocs.io/en/latest/data.html#composite-strategies. In the reimplementing_sets_strategy
example, we specifically need to do things the "bad" way, since the "good" way means we run into this issue whenever we try to use the strategy in a state machine.
This turned out to be pretty simple in the end! I thought that https://github.com/HypothesisWorks/hypothesis/pull/3894 might have helped, but it didn't at all - and at that point I was pretty confident that it wasn't actually due to filtering too much at all. Instead, it turned out to be trying to generate too much data; the fix is simply to stop taking additional steps if we've already generated 80% as much data as is possible. (we could tune it more precisely based on how many steps we've taken so far, but it's not really worth the trouble)
So why did avoiding filters seem to help? My best guess is that it's because your filter-free strategies (a) don't generate-and-discard when attempting to satisfy the filter, and (b) might generate smaller and simpler inputs overall. Happily, you'll now be able to use the same strategies across @given()
-based and stateful testing 😁