twisted icon indicating copy to clipboard operation
twisted copied to clipboard

twisted.trial._dist.test.test_stream.StreamTests.test_stream is flaky

Open glyph opened this issue 1 year ago • 12 comments

we appear to have misconfigured hypothesis so that it is doing something stochastic in CI, rather than taking a random seed and executing deterministically. there might be an acutal bug here in the application (although my experience with hypothesis thus far suggests that it's more likely a test bug):

see here

Traceback (most recent call last):
  File "/Users/runner/work/twisted/twisted/.tox/macos-withcov-alldeps/lib/python3.13/site-packages/twisted/trial/_dist/test/test_stream.py", line 198, in test_stream
    def test_stream(self, chunks: List[bytes]) -> None:
  File "/Users/runner/work/twisted/twisted/.tox/macos-withcov-alldeps/lib/python3.13/site-packages/hypothesis/core.py", line 1722, in wrapped_test
    raise the_error_hypothesis_found
hypothesis.errors.FlakyFailure: Hypothesis test_stream(self=<twisted.trial._dist.test.test_stream.StreamTests testMethod=test_stream>, chunks=[b'\xa2\x1a\xe4;\xa9\xff\xdc\xd4>\xd7m/\xb8',
 b'\xe7P8\xa8\xc4\xc6',
 b'=\x9d\xac\x00\xd8\xe1g\xfe\x88/o\xd4\xa5\xf4\xd6\xdf\x89\xed\xa8\x80;\xe4=rv\x8d',
 b'\xc00zMK\xbb\xe6\xbe\xbe',
 b'',
 b'\xc00zMK\xbb\xe6\xbe\xbe',
 b'\xae\x1a`\xcf\x91\x04#\xa1\xb9)',
 b'\xae\x1a`\xcf\x91\x04#\xa1\xb9)',
 b'\xa2\x1a\xe4;\xa9\xff\xdc\xd4>\xd7m/\xb8',
 b'\xfd\x0c\xc3Sfw\xfd\xd0\xda\xd3?\xce']) produces unreliable results: Falsified on the first call but did not on a subsequent one (1 sub-exception)

twisted.trial._dist.test.test_stream.StreamTests.test_stream

glyph avatar Sep 24 '24 19:09 glyph

If we disable the tests in the CI / GitHub Action, I think that there is a 99% probabilty that the tests will not be executed for future PR...

Nobody will bother to execute them locally, before asking for a review.

You can ask each reviewer to get the code and manually execute them befor approving a PR... but I think that's a lot to ask


we appear to have misconfigured hypothesis

If this is a misconfiguration, can't we just fix the configuration ?

Just asking.

adiroiban avatar Sep 24 '24 21:09 adiroiban

If we disable the tests in the CI / GitHub Action, I think that there is a 99% probabilty that the tests will not be executed for future PR...

Nobody will bother to execute them locally, before asking for a review.

You can ask each reviewer to get the code and manually execute them befor approving a PR... but I think that's a lot to ask

Yep, I'm not suggesting that, just that any tests which are stochastic need to be skipped in CI.

glyph avatar Sep 24 '24 22:09 glyph

If this is a misconfiguration, can't we just fix the configuration ?

Just asking.

Yep. The misconfiguration is that the tests are running with random data. The tests should be run in CI, but only with a fixed set of cases.

I'd rather (temporarily) disable them entirely if they're flaky, and enable them when we have figured out this configuration, since it can be tricky to figure out and while we are figuring it out we shouldn't be provoking more flakiness in CI than we already have :)

glyph avatar Sep 24 '24 22:09 glyph

FWIW I think you've misinterpreted the report here.

produces unreliable results: Falsified on the first call but did not on a subsequent one (1 sub-exception)

This isn't saying that a random seed resulted in Hypothesis exercising a new case that it doesn't always exercise and that case that is broken. It is saying that Hypothesis ran the code under test with this case twice and it passed once and failed once. ie, the test or the implementation is non-deterministic. So this is kind of like any flaky test - it just happens to be one that's using Hypothesis.

exarkun avatar Sep 24 '24 22:09 exarkun

FWIW I think you've misinterpreted the report here.

Thank you for the clarification! This error message could probably use a little work, but this is good to know. In that case, I'll amend the subject.

glyph avatar Sep 24 '24 23:09 glyph

I did indeed completely misinterpret the error message, and my sense of urgency to disable hypothesis was based on previous negative experiences where a mis-specified strategy resulted in a sort of fractal flakiness where every other PR on Klein was getting rejected due to a bug in Hyperlink's URL generating strategies. If this is regular flakiness that's unfortunate but doesn't need to be particularly aggressively prioritized.

glyph avatar Sep 24 '24 23:09 glyph

I think that it would help to have support in trial to catch these hypothesis.errors.FlakyFailure and report them in some way ... maybe place them in another file

Them, as part of our CI, we can raise an error or a warning when FlakyFailure are raised during the tests

Otherwise, it's harder to detected these flaky tests as part of the test failure

adiroiban avatar Sep 25 '24 07:09 adiroiban

I don't know what this means. trial does catch and report them and the flaky test was detected - leading to this issue being filed.

exarkun avatar Sep 25 '24 10:09 exarkun

Yeah, as the one who was confused here, I think the change needed would be in Hypothesis (to clarify the error message), not trial?

glyph avatar Sep 25 '24 16:09 glyph

My suggestion is to handle the hypothesis.errors.FlakyFailure as a warning ... maybe there is already a configuration option

That is, not to consider it as a test failure.

We already have flaky tests... and for now, we just ignore them by triggering a re-run


To me, the hypothesis error message makes sense.

adiroiban avatar Sep 26 '24 09:09 adiroiban

Another here:


Error: 
Traceback (most recent call last):
  File "/Users/runner/work/twisted/twisted/.tox/macos-withcov-alldeps/lib/python3.8/site-packages/twisted/trial/_dist/test/test_stream.py", line 198, in test_stream
    def test_stream(self, chunks: List[bytes]) -> None:
  File "/Users/runner/work/twisted/twisted/.tox/macos-withcov-alldeps/lib/python3.8/site-packages/hypothesis/core.py", line 1740, in wrapped_test
    raise the_error_hypothesis_found
hypothesis.errors.FlakyFailure: Hypothesis test_stream(self=<twisted.trial._dist.test.test_stream.StreamTests testMethod=test_stream>, chunks=[b'',
 b'\xe1\xe8!*q\xee\x940?g',
 b'\x1a0O\x83 \x7f]@)\x1b!\xe0\x98',
 b'\xda\xa3\xf9I']) produces unreliable results: Falsified on the first call but did not on a subsequent one (1 sub-exception)

twisted.trial._dist.test.test_stream.StreamTests.test_stream

twm avatar Dec 07 '24 01:12 twm

And again:

Error: 
Traceback (most recent call last):
  File "/Users/runner/work/twisted/twisted/.tox/macos-withcov-alldeps/lib/python3.8/site-packages/twisted/trial/_dist/test/test_stream.py", line 198, in test_stream
    def test_stream(self, chunks: List[bytes]) -> None:
  File "/Users/runner/work/twisted/twisted/.tox/macos-withcov-alldeps/lib/python3.8/site-packages/hypothesis/core.py", line 1740, in wrapped_test
    raise the_error_hypothesis_found
hypothesis.errors.FlakyFailure: Hypothesis test_stream(self=<twisted.trial._dist.test.test_stream.StreamTests testMethod=test_stream>, chunks=[b'\x12vA{\x0e\x93q\xc1G\xe9\x15\x92\xde\xea\xe1Z',
 b'5Co\xef\xdc\x14\x98',
 b'5Co\xef\xdc\x14\x98',
 b'5Co\xef\xdc\x14\x98',
 b'K\xc7\xc9\x89',
 b'K#\xc6\xca\xc1Io\\\xea)\x8a\xefU',
 b'/\x9a/>+',
 b'i\x12\xc6']) produces unreliable results: Falsified on the first call but did not on a subsequent one (1 sub-exception)

twisted.trial._dist.test.test_stream.StreamTests.test_stream
-------------------------------------------------------------------------------
Ran 11154 tests in 237.011s

FAILED (skips=628, errors=1, successes=10525)
/Users/runner/work/twisted/twisted/.tox/macos-withcov-alldeps/lib/python3.8/site-packages/twisted/conch/ssh/transport.py:110: CryptographyDeprecationWarning: TripleDES has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.TripleDES and will be removed from cryptography.hazmat.primitives.ciphers.algorithms in 48.0.0.

glyph avatar Mar 20 '25 22:03 glyph