quinn icon indicating copy to clipboard operation
quinn copied to clipboard

3 flaky tests

Open gretchenfrage opened this issue 11 months ago • 1 comments

I ran cargo test on Quinn (05f6e67de633245526d5d2773e27eb75c70b2bdd) 1,003,080 times. This is what I found.

There are 3 flaky tests:

  • tests::single_ack_eliciting_packet_triggers_ack_after_delay fails 0.101% of the time. I collected 1021 occurrences.

    The flakyness of this test was also noted in #2014.

    In one arbitrarily chosen failure, the error was this:

    thread 'tests::single_ack_eliciting_packet_triggers_ack_after_delay' panicked at quinn-proto/src/tests/mod.rs:2490:5:
    assertion `left == right` failed
      left: Instant { tv_sec: 137912, tv_nsec: 296566897 }
     right: Instant { tv_sec: 137912, tv_nsec: 218566897 }
    
  • tests::key_update_reordered fails 0.098% of the time. I collected 983 occurrences.

    The flakyness of this test was also noted in #1695.

    In one arbitrarily chosen failure, the error was this:

    thread 'tests::key_update_reordered' panicked at quinn-proto/src/tests/mod.rs:1064:5:
    assertion `left == right` failed
      left: 1
     right: 0
    
  • tests::key_update_simple fails 0.015% of the time. I collected 146.

    This is the rarest of the bunch, and I wasn't able to find evidence that this has been noticed before.

    In one arbitrarily chosen failure, the error was this:

    thread 'tests::key_update_simple' panicked at quinn-proto/src/tests/mod.rs:1021:5:
    assertion failed: `None` does not match `Some(Event::Stream(StreamEvent::Readable { id })) if id == s`
    

I am attaching to this issue grouped.zip, which contains the stdout/stderr of all runs in which the tests failed, grouped by which test failed. These contain terminal color codes, so I recommend you read the files with cat.

gretchenfrage avatar Dec 17 '24 07:12 gretchenfrage

https://github.com/gretchenfrage/quinn-scrutinizer

gretchenfrage avatar Dec 24 '24 08:12 gretchenfrage

https://github.com/quinn-rs/quinn/pull/2292 fixes the key_update_reordered case. I wouldn't be surprised if key_update_simple was a variation on the same root cause.

I don't immediately see how single_ack_eliciting_packet_triggers_ack_after_delay could be related, but it wouldn't be a shock, considering that initial key phase size is one of the few bits of nondeterminism we have in the test and the repro rate seems similar. If it is related, then it won't have been fixed by the above PR, but would be made consistent by using a constant RNG seed for the endpoint.

Ralith avatar Jul 06 '25 21:07 Ralith