Request for more helpful `hypothesis.errors.FlakyStrategyDefinition:` error message when a `precondition` is flaky.
I had a hard time debugging a stateful test failure today:
state.delete_group_using_del(data=data(...))
state.check_list_prefix_from_root()
Checking 1 expected keys vs 1 actual keys
['zarr.json']
['zarr.json']
has uncommitted_changes: True
state.teardown()
Traceback (most recent call last):
File "/Users/deepak/miniforge3/envs/icechunk/lib/python3.12/site-packages/hypothesis/core.py", line 1064, in _execute_once_for_engine
result = self.execute_once(data)
^^^^^^^^^^^^^^^^^^^^^^^
...
File "/Users/deepak/miniforge3/envs/icechunk/lib/python3.12/site-packages/hypothesis/internal/conjecture/datatree.py", line 1005, in draw_value
inconsistent_generation()
File "/Users/deepak/miniforge3/envs/icechunk/lib/python3.12/site-packages/hypothesis/internal/conjecture/datatree.py", line 52, in inconsistent_generation
raise FlakyStrategyDefinition(
hypothesis.errors.FlakyStrategyDefinition: Inconsistent data generation! Data generation behaved differently between different runs. Is your data generation depending on external state?
As you can see, this traceback does not tell me why this run was "flaky". After quite some debugging, it turns out the the next rule the state machine expected to fire would not fire because of a precondition that was not satisified. This precondition was satisfied on a previous run, and the rule fired. Thus the flakiness.
It would be a lot more helpful if hypothesis could surface the fact that a precondition for a particular rule was flaky, or at least tell us what rule it expected to fire next.
Thanks for reporting this - pointing out confusing or unhelpful error messages is really useful!
We choose which rule to run here, which in turn delegates to sampled_from(rules).filter(enabled) here. We don't have a good way to track which rule changed, but perhaps a general note would have helped?
try:
rule = data.draw(st.sampled_from(self.rules).filter(rule_is_enabled))
except FlakyStrategyDefinition as err:
err.add_note("Specifically, the expected rule could not run - this is usually due to a flaky predicate or an empty bundle.")
raise
A general note would be a big improvement. Is it feasible to add some debug level logging when doing the draw and filter?
Amazing. thank you!
Thank you for the issue! Feedback like this is so helpful for us.
And I guess thanks also to Claude, who implemented this and seven other PRs in one wild afternoon 🤯