fix: Mark all packets TX'ed before PTO as lost
We'd previously only mark 1 one or two packets as lost when a PTO fired. That meant that we potentially didn't RTX all data that we could have (i.e., that was in lost packets that we didn't mark lost).
This also changes the probing code to suppress redundant keep-alives, i.e., PINGs that we sent for other reasons, which could double as keep-alives but did not.
Broken out of #1998
Failed Interop Tests
QUIC Interop Runner, client vs. server, differences relative to b931a289eee7d62c0815535f01cfa34c5a929f9d.
neqo-latest as client
- neqo-latest vs. aioquic: Z
- neqo-latest vs. go-x-net: BP BA
- neqo-latest vs. haproxy: :warning:L1 BP BA
- neqo-latest vs. kwik: run cancelled after 20 min
- neqo-latest vs. lsquic: :warning:E L1 C1
- neqo-latest vs. msquic: R Z A L1 C1 :warning:C2
- neqo-latest vs. mvfst: A L1 C1 :warning:BA
- neqo-latest vs. nginx: BP BA
- neqo-latest vs. ngtcp2: CM
- neqo-latest vs. picoquic: :rocket:~~Z~~ A L1 C1
- neqo-latest vs. quic-go: A
- neqo-latest vs. quiche: BP BA
- neqo-latest vs. s2n-quic: BA CM
- neqo-latest vs. tquic: S BP BA
- neqo-latest vs. xquic: :warning:A
neqo-latest as server
- aioquic vs. neqo-latest: :warning:C1 CM
- go-x-net vs. neqo-latest: CM
- kwik vs. neqo-latest: BP BA CM
- lsquic vs. neqo-latest: run cancelled after 20 min
- msquic vs. neqo-latest: Z U CM
- mvfst vs. neqo-latest: Z A L1 C1 CM
- openssl vs. neqo-latest: LR M CM
- quic-go vs. neqo-latest: CM
- quiche vs. neqo-latest: CM
- quinn vs. neqo-latest: V2 CM
- s2n-quic vs. neqo-latest: CM
- tquic vs. neqo-latest: CM
- xquic vs. neqo-latest: M CM
All results
Succeeded Interop Tests
QUIC Interop Runner, client vs. server
neqo-latest as client
- neqo-latest vs. aioquic: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 V2 BP BA
- neqo-latest vs. go-x-net: H DC LR M B U A L2 C2 6
- neqo-latest vs. haproxy: H DC LR C20 M S R Z 3 B U A :warning:L1 L2 C1 C2 6 V2
- neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U :warning:E A L2 C2 6 V2 BP BA
- neqo-latest vs. msquic: H DC LR C20 M S B U L2 :warning:C2 6 V2 BP BA
- neqo-latest vs. mvfst: H DC LR M R Z 3 B U L2 C2 6 BP :warning:BA
- neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
- neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
- neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
- neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA
- neqo-latest vs. picoquic: H DC LR C20 M S R :rocket:~~Z~~ 3 B U E L2 C2 6 V2 BP BA
- neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U L1 L2 C1 C2 6 BP BA
- neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
- neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 BP BA
- neqo-latest vs. s2n-quic: H DC LR C20 M S R 3 B U E A L1 L2 C1 C2 6 BP
- neqo-latest vs. tquic: H DC LR C20 M R Z 3 B U A L1 L2 C1 C2 6
- neqo-latest vs. xquic: :rocket:~~H DC LR C20 M R Z 3 B U L1 L2 C1 C2 6 BP BA~~
neqo-latest as server
- aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B A L1 L2 :warning:C1 C2 6 V2 BP BA
- chrome vs. neqo-latest: 3
- go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6 BP BA
- kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
- msquic vs. neqo-latest: H DC LR C20 M S R B A L1 L2 C1 C2 6 V2 BP BA
- mvfst vs. neqo-latest: H DC LR M 3 B L2 C2 6 BP BA
- neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
- ngtcp2 vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
- openssl vs. neqo-latest: H DC C20 S R 3 B A L2 C2 6 BP BA
- picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
- quic-go vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 BP BA
- quiche vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 C1 C2 6 BP BA
- quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 BP BA
- s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6 BP BA
- tquic vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 C1 C2 6 BP BA
- xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6 BP BA
Unsupported Interop Tests
QUIC Interop Runner, client vs. server
neqo-latest as client
- neqo-latest vs. aioquic: E CM
- neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2 CM
- neqo-latest vs. haproxy: E CM
- neqo-latest vs. lsquic: CM
- neqo-latest vs. msquic: 3 E CM
- neqo-latest vs. mvfst: C20 S E V2 CM
- neqo-latest vs. nginx: E V2 CM
- neqo-latest vs. picoquic: CM
- neqo-latest vs. quic-go: E V2 CM
- neqo-latest vs. quiche: E V2 CM
- neqo-latest vs. quinn: V2 CM
- neqo-latest vs. s2n-quic: Z V2
- neqo-latest vs. tquic: E V2 CM
- neqo-latest vs. xquic: S E V2 CM
neqo-latest as server
- aioquic vs. neqo-latest: U E
- chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2 BP BA CM
- go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
- kwik vs. neqo-latest: E
- msquic vs. neqo-latest: 3 E
- mvfst vs. neqo-latest: C20 S R U E V2
- openssl vs. neqo-latest: Z U E L1 C1 V2
- quic-go vs. neqo-latest: E V2
- quiche vs. neqo-latest: C20 U E V2
- s2n-quic vs. neqo-latest: C20 Z U V2
- tquic vs. neqo-latest: C20 U E V2
- xquic vs. neqo-latest: E V2
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 95.42%. Comparing base (
8b4a9c9) to head (d8f1214).
Additional details and impacted files
@@ Coverage Diff @@
## main #2129 +/- ##
=======================================
Coverage 95.41% 95.42%
=======================================
Files 115 115
Lines 36996 36992 -4
Branches 36996 36992 -4
=======================================
- Hits 35301 35300 -1
+ Misses 1689 1686 -3
Partials 6 6
| Components | Coverage Δ | |
|---|---|---|
| neqo-common | 97.17% <ø> (ø) |
|
| neqo-crypto | 90.44% <ø> (ø) |
|
| neqo-http3 | 94.50% <ø> (ø) |
|
| neqo-qpack | 96.29% <ø> (ø) |
|
| neqo-transport | 96.26% <100.00%> (+0.01%) |
:arrow_up: |
| neqo-udp | 95.29% <ø> (ø) |
Benchmark results
Performance differences relative to 9b9f28267de1d744e270cbb0725c0e57d170b745.
decode 4096 bytes, mask ff: No change in performance detected.
time: [11.853 µs 11.895 µs 11.942 µs]
change: [-0.1831% +0.2708% +0.7641%] (p = 0.28 > 0.05)
Found 20 outliers among 100 measurements (20.00%)
2 (2.00%) low severe
6 (6.00%) low mild
1 (1.00%) high mild
11 (11.00%) high severe
decode 1048576 bytes, mask ff: No change in performance detected.
time: [3.0685 ms 3.0779 ms 3.0890 ms]
change: [-0.5254% -0.0469% +0.4310%] (p = 0.88 > 0.05)
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
9 (9.00%) high severe
decode 4096 bytes, mask 7f: No change in performance detected.
time: [19.791 µs 19.843 µs 19.900 µs]
change: [-0.4664% -0.0184% +0.3429%] (p = 0.93 > 0.05)
Found 21 outliers among 100 measurements (21.00%)
3 (3.00%) low severe
3 (3.00%) low mild
15 (15.00%) high severe
decode 1048576 bytes, mask 7f: No change in performance detected.
time: [5.1465 ms 5.1580 ms 5.1711 ms]
change: [-0.3820% -0.0391% +0.3158%] (p = 0.83 > 0.05)
Found 13 outliers among 100 measurements (13.00%)
13 (13.00%) high severe
decode 4096 bytes, mask 3f: No change in performance detected.
time: [6.8668 µs 6.8719 µs 6.8815 µs]
change: [-0.5871% +0.0812% +1.0176%] (p = 0.86 > 0.05)
Found 15 outliers among 100 measurements (15.00%)
5 (5.00%) high mild
10 (10.00%) high severe
decode 1048576 bytes, mask 3f: No change in performance detected.
time: [1.7594 ms 1.7663 ms 1.7747 ms]
change: [-0.1505% +0.4023% +1.0339%] (p = 0.17 > 0.05)
Found 10 outliers among 100 measurements (10.00%)
10 (10.00%) high severe
1 streams of 1 bytes/multistream: No change in performance detected.
time: [68.961 µs 69.539 µs 70.573 µs]
change: [-2.2059% -0.2415% +1.8217%] (p = 0.81 > 0.05)
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
1000 streams of 1 bytes/multistream: Change within noise threshold.
time: [24.838 ms 24.870 ms 24.903 ms]
change: [+0.0198% +0.2153% +0.4047%] (p = 0.03 10000 streams of 1 bytes/multistream: Change within noise threshold.
time: [1.6683 s 1.6701 s 1.6720 s]
change: [+0.0264% +0.1789% +0.3398%] (p = 0.03 Found 23 outliers among 100 measurements (23.00%)
1 (1.00%) low severe
13 (13.00%) low mild
1 (1.00%) high mild
8 (8.00%) high severe1 streams of 1000 bytes/multistream: No change in performance detected.
time: [70.580 µs 71.562 µs 72.991 µs]
change: [-0.2161% +1.1952% +3.7551%] (p = 0.22 > 0.05)
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
100 streams of 1000 bytes/multistream: No change in performance detected.
time: [3.3333 ms 3.3398 ms 3.3467 ms]
change: [-0.3140% -0.0457% +0.2232%] (p = 0.74 > 0.05)
Found 22 outliers among 100 measurements (22.00%)
22 (22.00%) high severe
1000 streams of 1000 bytes/multistream: Change within noise threshold.
time: [145.67 ms 145.74 ms 145.82 ms]
change: [-0.4432% -0.3643% -0.2861%] (p = 0.00 Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mildcoalesce_acked_from_zero 1+1 entries: No change in performance detected.
time: [92.530 ns 92.870 ns 93.209 ns]
change: [-0.4077% +0.0189% +0.4379%] (p = 0.93 > 0.05)
Found 10 outliers among 100 measurements (10.00%)
7 (7.00%) high mild
3 (3.00%) high severe
coalesce_acked_from_zero 3+1 entries: No change in performance detected.
time: [110.22 ns 110.52 ns 110.87 ns]
change: [-0.7203% -0.0344% +0.6664%] (p = 0.93 > 0.05)
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) low mild
2 (2.00%) high mild
7 (7.00%) high severe
coalesce_acked_from_zero 10+1 entries: No change in performance detected.
time: [109.80 ns 110.11 ns 110.50 ns]
change: [-0.6122% -0.0751% +0.4483%] (p = 0.79 > 0.05)
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) low mild
4 (4.00%) high severe
coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
time: [91.334 ns 91.371 ns 91.412 ns]
change: [-14.471% -5.2698% +0.4266%] (p = 0.36 > 0.05)
Found 15 outliers among 100 measurements (15.00%)
6 (6.00%) high mild
9 (9.00%) high severe
RxStreamOrderer::inbound_frame(): No change in performance detected.
time: [115.69 ms 115.74 ms 115.80 ms]
change: [-0.1067% -0.0450% +0.0185%] (p = 0.17 > 0.05)
Found 20 outliers among 100 measurements (20.00%)
8 (8.00%) low severe
2 (2.00%) high mild
10 (10.00%) high severe
SentPackets::take_ranges: No change in performance detected.
time: [5.3293 µs 5.4842 µs 5.6501 µs]
change: [-3.3789% -0.3503% +2.7603%] (p = 0.83 > 0.05)
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe
transfer/pacing-false/varying-seeds: Change within noise threshold.
time: [34.490 ms 34.571 ms 34.654 ms]
change: [+1.5050% +1.7832% +2.0738%] (p = 0.00 Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mildtransfer/pacing-true/varying-seeds: Change within noise threshold.
time: [34.736 ms 34.798 ms 34.860 ms]
change: [+2.4377% +2.6805% +2.9250%] (p = 0.00 transfer/pacing-false/same-seed: :broken_heart: Performance has regressed.
time: [34.846 ms 34.889 ms 34.933 ms]
change: [+3.1366% +3.3257% +3.5059%] (p = 0.00 Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) low mild
3 (3.00%) high mildtransfer/pacing-true/same-seed: Change within noise threshold.
time: [35.239 ms 35.295 ms 35.351 ms]
change: [+2.3837% +2.5987% +2.8080%] (p = 0.00 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: Change within noise threshold.
time: [2.1931 s 2.2014 s 2.2099 s]
thrpt: [45.251 MiB/s 45.425 MiB/s 45.597 MiB/s]
change:
time: [-1.7534% -1.2122% -0.6459%] (p = 0.00 +1.2271% +1.7847%]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.
time: [389.36 ms 391.71 ms 394.15 ms]
thrpt: [25.371 Kelem/s 25.529 Kelem/s 25.683 Kelem/s]
change:
time: [-0.8230% -0.0559% +0.7204%] (p = 0.89 > 0.05)
thrpt: [-0.7153% +0.0559% +0.8298%]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: :broken_heart: Performance has regressed.
time: [30.396 ms 31.127 ms 31.832 ms]
thrpt: [31.414 elem/s 32.126 elem/s 32.899 elem/s]
change:
time: [+3.9740% +7.4931% +11.211%] (p = 0.00 -6.9707% -3.8222%]
1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: :broken_heart: Performance has regressed.
time: [3.5065 s 3.5291 s 3.5518 s]
thrpt: [28.155 MiB/s 28.335 MiB/s 28.518 MiB/s]
change:
time: [+7.9570% +9.0897% +10.205%] (p = 0.00 -8.3323% -7.3706%]
Client/server transfer results
Performance differences relative to 9b9f28267de1d744e270cbb0725c0e57d170b745.
Transfer of 33554432 bytes over loopback, 30 runs. All unit-less numbers are in milliseconds.
| Client | Server | CC | Pacing | Mean ± σ | Min | Max | Δ main |
Δ main |
|---|---|---|---|---|---|---|---|---|
| neqo | neqo | reno | on | 517.1 ± 103.9 | 439.6 | 975.1 | 5.3 | 0.3% |
| neqo | neqo | reno | 560.1 ± 184.3 | 441.5 | 1153.2 | -2.0 | -0.1% | |
| neqo | neqo | cubic | on | 502.6 ± 39.3 | 459.4 | 628.7 | -22.7 | -1.1% |
| neqo | neqo | cubic | 506.8 ± 45.6 | 451.1 | 646.4 | -2.3 | -0.1% | |
| neqo | reno | on | 908.9 ± 101.9 | 672.2 | 1121.7 | 8.4 | 0.2% | |
| neqo | reno | 911.8 ± 100.7 | 648.3 | 1098.7 | 0.0 | 0.0% | ||
| neqo | cubic | on | 902.2 ± 95.0 | 662.6 | 991.9 | 2.7 | 0.1% | |
| neqo | cubic | 900.2 ± 95.2 | 646.0 | 984.0 | 0.9 | 0.0% | ||
| 563.1 ± 53.7 | 532.4 | 774.9 | 8.2 | 0.4% | ||||
| neqo | msquic | reno | on | 224.1 ± 32.6 | 199.8 | 382.6 | 6.8 | 0.8% |
| neqo | msquic | reno | 220.5 ± 19.2 | 199.0 | 278.0 | -6.0 | -0.7% | |
| neqo | msquic | cubic | on | 214.2 ± 10.2 | 198.8 | 242.9 | -6.3 | -0.7% |
| neqo | msquic | cubic | 217.5 ± 14.9 | 196.8 | 254.9 | -1.7 | -0.2% | |
| msquic | msquic | 129.8 ± 43.2 | 97.9 | 332.8 | -2.1 | -0.4% |
Firefox builds for this PR
The following builds are available for testing. Crossed-out builds did not succeed.
@martinthomson I'd appreciate a review, since the code I am touching is pretty complex.
Do we not have tests for this? Should we?
There are tests in #2128, but this PR alone doesn't make them succeed yet.
We'd previously only mark 1 one or two packets as lost when a PTO fired. That meant that we potentially didn't RTX all data that we could have (i.e., that was in lost packets that we didn't mark lost).
Will work on a unit test for this in a bit.
This also changes the probing code to suppress redundant keep-alives, i.e., PINGs that we sent for other reasons, which could double as keep-alives but did not.
I suggest merging this fix through https://github.com/mozilla/neqo/pull/2509.
We'd previously only mark 1 one or two packets as lost when a PTO fired. That meant that we potentially didn't RTX all data that we could have (i.e., that was in lost packets that we didn't mark lost).
Will work on a unit test for this in a bit.
Unit test added through https://github.com/mozilla/neqo/pull/2129/commits/954dc50941f863e0e72e3494339615e461bf0e55.
This also changes the probing code to suppress redundant keep-alives, i.e., PINGs that we sent for other reasons, which could double as keep-alives but did not.
I suggest merging this fix through #2509.
Let's wait for #2509 to merge. Once merged, I will update this pull request, thus reducing the diff.
@mxinden anything left to be done for this one?