neqo
neqo copied to clipboard
idle_timeout_crazy_rtt can timeout if there are two losses
Run:
SIMULATION_SEED=a5cde2a8492f2e704f54105ed28192deb79005d4f0356236a81550b867471d40 \
cargo test -p neqo-transport --test network -- idle_timeout_crazy_rtt --nocapture
Fixing this might require some work.
OK, analysis is that this is the result of a rare double-loss of packets containing HANDSHAKE_DONE. Nothing serious here; we expect that to happen because this test is badly exposed to loss as the idle timeout ends up being set to 3PTO.
Now, we've discussed sending more aggressively on PTO (by making the first PTO happen at half the nominal time), which might help fix this. Until we have something more stable, I'll keep this open.
98c5c815e436f9adc043bd0509f752ee967f0502526b6acf1880fc312f0206ce
It took longer to get a failure, but here is another one: ecebadd179ce347280ffcd3e62b5037745f5b588017dace25007bb6914f9d56d
This is happening frequently enough in CI that we should really fix this.
I am unable to reproduce this failure locally, neither randomly running in a loop (while cargo test -p neqo-transport --test network -- idle_timeout_crazy_rtt --nocapture; do :; done), nor with the above seeds.
Any additional pointers, e.g. CI failure links, ...
Local machine:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 23.10
Release: 23.10
Codename: mantic
IIRC the CI failures are always on Windows.
This doesn't seem to happen anymore.