twisted.trial._dist.test.test_stream.StreamTests.test_stream is flaky
we appear to have misconfigured hypothesis so that it is doing something stochastic in CI, rather than taking a random seed and executing deterministically. there might be an acutal bug here in the application (although my experience with hypothesis thus far suggests that it's more likely a test bug):
see here
Traceback (most recent call last):
File "/Users/runner/work/twisted/twisted/.tox/macos-withcov-alldeps/lib/python3.13/site-packages/twisted/trial/_dist/test/test_stream.py", line 198, in test_stream
def test_stream(self, chunks: List[bytes]) -> None:
File "/Users/runner/work/twisted/twisted/.tox/macos-withcov-alldeps/lib/python3.13/site-packages/hypothesis/core.py", line 1722, in wrapped_test
raise the_error_hypothesis_found
hypothesis.errors.FlakyFailure: Hypothesis test_stream(self=<twisted.trial._dist.test.test_stream.StreamTests testMethod=test_stream>, chunks=[b'\xa2\x1a\xe4;\xa9\xff\xdc\xd4>\xd7m/\xb8',
b'\xe7P8\xa8\xc4\xc6',
b'=\x9d\xac\x00\xd8\xe1g\xfe\x88/o\xd4\xa5\xf4\xd6\xdf\x89\xed\xa8\x80;\xe4=rv\x8d',
b'\xc00zMK\xbb\xe6\xbe\xbe',
b'',
b'\xc00zMK\xbb\xe6\xbe\xbe',
b'\xae\x1a`\xcf\x91\x04#\xa1\xb9)',
b'\xae\x1a`\xcf\x91\x04#\xa1\xb9)',
b'\xa2\x1a\xe4;\xa9\xff\xdc\xd4>\xd7m/\xb8',
b'\xfd\x0c\xc3Sfw\xfd\xd0\xda\xd3?\xce']) produces unreliable results: Falsified on the first call but did not on a subsequent one (1 sub-exception)
twisted.trial._dist.test.test_stream.StreamTests.test_stream
If we disable the tests in the CI / GitHub Action, I think that there is a 99% probabilty that the tests will not be executed for future PR...
Nobody will bother to execute them locally, before asking for a review.
You can ask each reviewer to get the code and manually execute them befor approving a PR... but I think that's a lot to ask
we appear to have misconfigured hypothesis
If this is a misconfiguration, can't we just fix the configuration ?
Just asking.
If we disable the tests in the CI / GitHub Action, I think that there is a 99% probabilty that the tests will not be executed for future PR...
Nobody will bother to execute them locally, before asking for a review.
You can ask each reviewer to get the code and manually execute them befor approving a PR... but I think that's a lot to ask
Yep, I'm not suggesting that, just that any tests which are stochastic need to be skipped in CI.
If this is a misconfiguration, can't we just fix the configuration ?
Just asking.
Yep. The misconfiguration is that the tests are running with random data. The tests should be run in CI, but only with a fixed set of cases.
I'd rather (temporarily) disable them entirely if they're flaky, and enable them when we have figured out this configuration, since it can be tricky to figure out and while we are figuring it out we shouldn't be provoking more flakiness in CI than we already have :)
FWIW I think you've misinterpreted the report here.
produces unreliable results: Falsified on the first call but did not on a subsequent one (1 sub-exception)
This isn't saying that a random seed resulted in Hypothesis exercising a new case that it doesn't always exercise and that case that is broken. It is saying that Hypothesis ran the code under test with this case twice and it passed once and failed once. ie, the test or the implementation is non-deterministic. So this is kind of like any flaky test - it just happens to be one that's using Hypothesis.
FWIW I think you've misinterpreted the report here.
Thank you for the clarification! This error message could probably use a little work, but this is good to know. In that case, I'll amend the subject.
I did indeed completely misinterpret the error message, and my sense of urgency to disable hypothesis was based on previous negative experiences where a mis-specified strategy resulted in a sort of fractal flakiness where every other PR on Klein was getting rejected due to a bug in Hyperlink's URL generating strategies. If this is regular flakiness that's unfortunate but doesn't need to be particularly aggressively prioritized.
I think that it would help to have support in trial to catch these hypothesis.errors.FlakyFailure and report them in some way ... maybe place them in another file
Them, as part of our CI, we can raise an error or a warning when FlakyFailure are raised during the tests
Otherwise, it's harder to detected these flaky tests as part of the test failure
I don't know what this means. trial does catch and report them and the flaky test was detected - leading to this issue being filed.
Yeah, as the one who was confused here, I think the change needed would be in Hypothesis (to clarify the error message), not trial?
My suggestion is to handle the hypothesis.errors.FlakyFailure as a warning ... maybe there is already a configuration option
That is, not to consider it as a test failure.
We already have flaky tests... and for now, we just ignore them by triggering a re-run
To me, the hypothesis error message makes sense.
Another here:
Error:
Traceback (most recent call last):
File "/Users/runner/work/twisted/twisted/.tox/macos-withcov-alldeps/lib/python3.8/site-packages/twisted/trial/_dist/test/test_stream.py", line 198, in test_stream
def test_stream(self, chunks: List[bytes]) -> None:
File "/Users/runner/work/twisted/twisted/.tox/macos-withcov-alldeps/lib/python3.8/site-packages/hypothesis/core.py", line 1740, in wrapped_test
raise the_error_hypothesis_found
hypothesis.errors.FlakyFailure: Hypothesis test_stream(self=<twisted.trial._dist.test.test_stream.StreamTests testMethod=test_stream>, chunks=[b'',
b'\xe1\xe8!*q\xee\x940?g',
b'\x1a0O\x83 \x7f]@)\x1b!\xe0\x98',
b'\xda\xa3\xf9I']) produces unreliable results: Falsified on the first call but did not on a subsequent one (1 sub-exception)
twisted.trial._dist.test.test_stream.StreamTests.test_stream
And again:
Error:
Traceback (most recent call last):
File "/Users/runner/work/twisted/twisted/.tox/macos-withcov-alldeps/lib/python3.8/site-packages/twisted/trial/_dist/test/test_stream.py", line 198, in test_stream
def test_stream(self, chunks: List[bytes]) -> None:
File "/Users/runner/work/twisted/twisted/.tox/macos-withcov-alldeps/lib/python3.8/site-packages/hypothesis/core.py", line 1740, in wrapped_test
raise the_error_hypothesis_found
hypothesis.errors.FlakyFailure: Hypothesis test_stream(self=<twisted.trial._dist.test.test_stream.StreamTests testMethod=test_stream>, chunks=[b'\x12vA{\x0e\x93q\xc1G\xe9\x15\x92\xde\xea\xe1Z',
b'5Co\xef\xdc\x14\x98',
b'5Co\xef\xdc\x14\x98',
b'5Co\xef\xdc\x14\x98',
b'K\xc7\xc9\x89',
b'K#\xc6\xca\xc1Io\\\xea)\x8a\xefU',
b'/\x9a/>+',
b'i\x12\xc6']) produces unreliable results: Falsified on the first call but did not on a subsequent one (1 sub-exception)
twisted.trial._dist.test.test_stream.StreamTests.test_stream
-------------------------------------------------------------------------------
Ran 11154 tests in 237.011s
FAILED (skips=628, errors=1, successes=10525)
/Users/runner/work/twisted/twisted/.tox/macos-withcov-alldeps/lib/python3.8/site-packages/twisted/conch/ssh/transport.py:110: CryptographyDeprecationWarning: TripleDES has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.TripleDES and will be removed from cryptography.hazmat.primitives.ciphers.algorithms in 48.0.0.