neqo icon indicating copy to clipboard operation
neqo copied to clipboard

feat(transport/cubic): simplify byte tracking across acks

Open omansfeld opened this issue 2 months ago • 4 comments

We used to only increase w_est by multiples of max_datagram_size, thus not using up all of the new_acked_bytes. Therefore we had to calculate how many bytes were actually used, subtract those and track the leftover bytes across calls of bytes_for_cwnd_increase.

This patch gets rid of that logic, always applying all of new_bytes_acked. The congestion window is still only going to increase in max_datagram_size sized steps through the logic in ClassicCongestionControl::on_packets_acked, but we don't need the leftover byte tracking anymore.

As visible in CI below this patch does change some behavior -- we have 2 tests failing, thus apparently the congestion window growth is different.

Because w_est is not floored to be mss-sized anymore it is now potentially a bigger value than before. And since capping the cwnd increase to be mss-sized only happens in ClassicCongestionControl::on_packets_acked the return value of bytes_for_cwnd_increase changes, too. That means we now need less bytes acked to increase by 1*mss in on_packets_acked.

https://github.com/mozilla/neqo/blob/161476aae6200030c849b7f1fa639067224caab3/neqo-transport/src/cc/classic_cc.rs#L238-L265

While this should in theory only lead to faster growth of the congestion window, if you combine it with the double increase we are sometimes doing, it does lead to weird behavior in our tests. Similar behavior was first seen and described in this comment: https://github.com/mozilla/neqo/pull/2535#discussion_r2281629718

I think this patch is the correct way to do it, but as per all the above it does need more time and testing to verify it doesn't lead to unwanted behavior.

I'm uploading this draft PR with failing tests as a next step/reminder/reference per @mxinden's ask in https://github.com/mozilla/neqo/pull/2973#pullrequestreview-3309420877.

Part of #3053

omansfeld avatar Oct 13 '25 10:10 omansfeld

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

github-actions[bot] avatar Oct 13 '25 10:10 github-actions[bot]

Client/server transfer results

Performance differences relative to 3d2948d0bb332824a84f6907b77d644748a8f3d8.

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ main Δ main
google vs. google 453.2 ± 4.6 445.9 468.9 70.6 ± 7.0
google vs. neqo (cubic, paced) 277.6 ± 4.9 270.3 296.9 115.3 ± 6.5 -0.3 -0.1%
msquic vs. msquic 155.6 ± 19.8 135.8 240.6 205.7 ± 1.6
msquic vs. neqo (cubic, paced) 181.2 ± 18.3 148.1 253.8 176.6 ± 1.7 2.2 1.3%
neqo vs. google (cubic, paced) 753.6 ± 5.3 745.6 772.9 42.5 ± 6.0 -1.0 -0.1%
neqo vs. msquic (cubic, paced) 156.1 ± 4.2 150.4 164.1 204.9 ± 7.6 :green_heart: -2.2 -1.4%
neqo vs. neqo (cubic) 90.5 ± 5.3 82.3 105.8 353.7 ± 6.0 0.8 0.9%
neqo vs. neqo (cubic, paced) 89.9 ± 4.1 82.7 98.8 355.8 ± 7.8 :broken_heart: 2.2 2.5%
neqo vs. neqo (reno) 89.2 ± 4.9 81.6 104.9 358.6 ± 6.5 1.2 1.4%
neqo vs. neqo (reno, paced) 91.4 ± 4.6 82.4 106.8 349.9 ± 7.0 :broken_heart: 1.5 1.7%
neqo vs. quiche (cubic, paced) 191.1 ± 3.5 185.7 202.7 167.4 ± 9.1 :green_heart: -2.1 -1.1%
neqo vs. s2n (cubic, paced) 217.1 ± 4.1 210.5 233.4 147.4 ± 7.8 :green_heart: -2.4 -1.1%
quiche vs. neqo (cubic, paced) 154.4 ± 4.8 144.5 168.5 207.2 ± 6.7 0.9 0.6%
quiche vs. quiche 141.0 ± 4.5 134.4 161.0 227.0 ± 7.1
s2n vs. neqo (cubic, paced) 176.0 ± 5.1 166.9 187.7 181.8 ± 6.3 0.8 0.5%
s2n vs. s2n 242.1 ± 14.6 232.6 341.6 132.2 ± 2.2

Download data for profiler.firefox.com or download performance comparison data.

github-actions[bot] avatar Oct 13 '25 11:10 github-actions[bot]

Benchmark results

Performance differences relative to 3d2948d0bb332824a84f6907b77d644748a8f3d8.

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: No change in performance detected.
       time:   [200.10 ms 200.44 ms 200.78 ms]
       thrpt:  [498.05 MiB/s 498.91 MiB/s 499.74 MiB/s]
change:
       time:   [−0.4354% −0.0993% +0.2096%] (p = 0.56 > 0.05)
       thrpt:  [−0.2092% +0.0994% +0.4373%]

Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild

1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.
       time:   [283.33 ms 285.04 ms 286.73 ms]
       thrpt:  [34.876 Kelem/s 35.083 Kelem/s 35.294 Kelem/s]
change:
       time:   [−0.8581% −0.0201% +0.8322%] (p = 0.97 > 0.05)
       thrpt:  [−0.8254% +0.0201% +0.8655%]
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.
       time:   [28.485 ms 28.552 ms 28.635 ms]
       thrpt:  [34.923   B/s 35.024   B/s 35.106   B/s]
change:
       time:   [+0.0020% +0.2560% +0.5447%] (p = 0.08 > 0.05)
       thrpt:  [−0.5418% −0.2553% −0.0020%]

Found 23 outliers among 100 measurements (23.00%) 2 (2.00%) low severe 1 (1.00%) low mild 20 (20.00%) high severe

1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: Change within noise threshold.
       time:   [202.75 ms 203.10 ms 203.54 ms]
       thrpt:  [491.30 MiB/s 492.37 MiB/s 493.21 MiB/s]
change:
       time:   [−1.4531% −1.2138% −0.9722%] (p = 0.00 +1.2287% +1.4745%]

Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe

decode 4096 bytes, mask ff: No change in performance detected.
       time:   [11.610 µs 11.654 µs 11.703 µs]
       change: [−0.3157% +0.1136% +0.5898%] (p = 0.63 > 0.05)

Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild 11 (11.00%) high severe

decode 1048576 bytes, mask ff: No change in performance detected.
       time:   [3.0236 ms 3.0444 ms 3.0769 ms]
       change: [−0.3433% +0.4510% +1.6505%] (p = 0.43 > 0.05)

Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) low mild 1 (1.00%) high mild 9 (9.00%) high severe

decode 4096 bytes, mask 7f: No change in performance detected.
       time:   [19.933 µs 19.977 µs 20.029 µs]
       change: [−0.2619% +0.0746% +0.4171%] (p = 0.69 > 0.05)

Found 16 outliers among 100 measurements (16.00%) 3 (3.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild 10 (10.00%) high severe

decode 1048576 bytes, mask 7f: No change in performance detected.
       time:   [5.0417 ms 5.0514 ms 5.0627 ms]
       change: [−0.5101% −0.1719% +0.1508%] (p = 0.32 > 0.05)

Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low mild 10 (10.00%) high severe

decode 4096 bytes, mask 3f: No change in performance detected.
       time:   [8.2552 µs 8.2812 µs 8.3147 µs]
       change: [−1.4609% −0.6251% −0.0075%] (p = 0.09 > 0.05)

Found 18 outliers among 100 measurements (18.00%) 5 (5.00%) low mild 6 (6.00%) high mild 7 (7.00%) high severe

decode 1048576 bytes, mask 3f: No change in performance detected.
       time:   [1.5869 ms 1.5937 ms 1.6008 ms]
       change: [−0.5145% +0.0899% +0.6983%] (p = 0.77 > 0.05)

Found 9 outliers among 100 measurements (9.00%) 9 (9.00%) high severe

1-streams/each-1000-bytes/wallclock-time: No change in performance detected.
       time:   [584.23 µs 585.14 µs 586.22 µs]
       change: [−0.1366% +0.2323% +0.5765%] (p = 0.21 > 0.05)

Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe 1-streams/each-1000-bytes/simulated-time time: [118.62 ms 118.85 ms 119.08 ms] thrpt: [8.2007 KiB/s 8.2166 KiB/s 8.2328 KiB/s] change: time: [−0.4812% −0.1987% +0.0583%] (p = 0.15 > 0.05) thrpt: [−0.0582% +0.1991% +0.4836%] No change in performance detected. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild

1000-streams/each-1-bytes/wallclock-time: No change in performance detected.
       time:   [13.733 ms 13.757 ms 13.781 ms]
       change: [−0.3427% −0.0705% +0.1944%] (p = 0.61 > 0.05)

Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild 1000-streams/each-1-bytes/simulated-time time: [14.977 s 14.990 s 15.003 s] thrpt: [66.654 B/s 66.712 B/s 66.769 B/s] change: time: [−0.1639% −0.0387% +0.0788%] (p = 0.54 > 0.05) thrpt: [−0.0787% +0.0388% +0.1641%] No change in performance detected.

1000-streams/each-1000-bytes/wallclock-time: No change in performance detected.
       time:   [49.495 ms 49.654 ms 49.811 ms]
       change: [−0.4862% −0.0411% +0.4140%] (p = 0.86 > 0.05)
1000-streams/each-1000-bytes/simulated-time: :green_heart: Performance has improved.
       time:   [18.327 s 18.494 s 18.664 s]
       thrpt:  [52.324 KiB/s 52.803 KiB/s 53.286 KiB/s]
change:
       time:   [−3.9222% −2.7166% −1.4744%] (p = 0.00 +2.7925% +4.0823%]
coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [87.919 ns 88.213 ns 88.504 ns]
       change: [−0.4815% −0.0624% +0.3642%] (p = 0.78 > 0.05)

Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [105.95 ns 106.31 ns 106.68 ns]
       change: [−0.3207% +0.1870% +0.7937%] (p = 0.52 > 0.05)

Found 13 outliers among 100 measurements (13.00%) 13 (13.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [105.29 ns 105.70 ns 106.19 ns]
       change: [−0.3559% +0.1534% +0.7735%] (p = 0.59 > 0.05)

Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 1 (1.00%) high mild 8 (8.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [88.972 ns 89.089 ns 89.218 ns]
       change: [−0.8392% −0.0277% +0.7337%] (p = 0.95 > 0.05)

Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) high mild 4 (4.00%) high severe

RxStreamOrderer::inbound_frame(): No change in performance detected.
       time:   [107.85 ms 108.01 ms 108.28 ms]
       change: [−0.6770% −0.3155% +0.0171%] (p = 0.06 > 0.05)

Found 27 outliers among 100 measurements (27.00%) 13 (13.00%) low mild 11 (11.00%) high mild 3 (3.00%) high severe

sent::Packets::take_ranges: No change in performance detected.
       time:   [4.5245 µs 4.5888 µs 4.6415 µs]
       change: [−3.1125% −0.4276% +2.3765%] (p = 0.77 > 0.05)

Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe

transfer/pacing-false/varying-seeds/wallclock-time/run: Change within noise threshold.
       time:   [25.210 ms 25.249 ms 25.288 ms]
       change: [+1.0093% +1.2297% +1.4401%] (p = 0.00 Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild
transfer/pacing-false/varying-seeds/simulated-time/run: Change within noise threshold.
       time:   [25.067 s 25.098 s 25.129 s]
       thrpt:  [163.00 KiB/s 163.20 KiB/s 163.41 KiB/s]
change:
       time:   [−0.4680% −0.2891% −0.1010%] (p = 0.00 +0.2899% +0.4702%]

Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high mild

transfer/pacing-true/varying-seeds/wallclock-time/run: Change within noise threshold.
       time:   [25.717 ms 25.776 ms 25.836 ms]
       change: [+0.7246% +1.1115% +1.4825%] (p = 0.00 Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
transfer/pacing-true/varying-seeds/simulated-time/run: No change in performance detected.
       time:   [24.922 s 24.957 s 24.991 s]
       thrpt:  [163.90 KiB/s 164.12 KiB/s 164.36 KiB/s]
change:
       time:   [−0.4044% −0.1997% +0.0016%] (p = 0.05 > 0.05)
       thrpt:  [−0.0016% +0.2001% +0.4060%]

Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) low mild

transfer/pacing-false/same-seed/wallclock-time/run: Change within noise threshold.
       time:   [25.074 ms 25.093 ms 25.112 ms]
       change: [−0.7422% −0.5707% −0.4276%] (p = 0.00 
transfer/pacing-false/same-seed/simulated-time/run: Change within noise threshold.
       time:   [25.492 s 25.492 s 25.492 s]
       thrpt:  [160.68 KiB/s 160.68 KiB/s 160.68 KiB/s]
change:
       time:   [−0.8478% −0.8478% −0.8478%] (p = 0.00 +0.8551% +0.8551%]
transfer/pacing-true/same-seed/wallclock-time/run: No change in performance detected.
       time:   [26.121 ms 26.147 ms 26.186 ms]
       change: [−0.0375% +0.1950% +0.3844%] (p = 0.07 > 0.05)

Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe

transfer/pacing-true/same-seed/simulated-time/run: :green_heart: Performance has improved.
       time:   [25.131 s 25.131 s 25.131 s]
       thrpt:  [162.99 KiB/s 162.99 KiB/s 162.99 KiB/s]
change:
       time:   [−2.1210% −2.1210% −2.1210%] (p = 0.00 +2.1669% +2.1669%]

Download data for profiler.firefox.com or download performance comparison data.

github-actions[bot] avatar Oct 13 '25 12:10 github-actions[bot]

🐰 Bencher Report

Branchsimplify_byte_tracking
TestbedOn-prem

🚨 1 Alert

BenchmarkMeasure
Units
ViewBenchmark Result
(Result Δ%)
Upper Boundary
(Limit %)
decode 1048576 bytes, mask ffLatency
milliseconds (ms)
📈 plot
🚷 threshold
🚨 alert (🔔)
3.04 ms
(+0.42%)Baseline: 3.03 ms
3.04 ms
(100.01%)

Click to view all benchmark results
BenchmarkLatencyBenchmark Result
nanoseconds (ns)
(Result Δ%)
Upper Boundary
nanoseconds (ns)
(Limit %)
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client📈 view plot
🚷 view threshold
203,100,000.00 ns
(-2.93%)Baseline: 209,237,251.46 ns
217,763,767.26 ns
(93.27%)
1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client📈 view plot
🚷 view threshold
200,440,000.00 ns
(-1.39%)Baseline: 203,266,140.35 ns
212,732,889.41 ns
(94.22%)
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client📈 view plot
🚷 view threshold
28,552,000.00 ns
(+0.49%)Baseline: 28,413,152.05 ns
28,876,403.00 ns
(98.88%)
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client📈 view plot
🚷 view threshold
285,040,000.00 ns
(-3.38%)Baseline: 295,007,192.98 ns
306,049,858.23 ns
(93.14%)
1-streams/each-1000-bytes/simulated-time📈 view plot
🚷 view threshold
118,850,000.00 ns
(+0.48%)Baseline: 118,287,894.74 ns
120,885,411.72 ns
(98.32%)
1-streams/each-1000-bytes/wallclock-time📈 view plot
🚷 view threshold
585,140.00 ns
(-2.26%)Baseline: 598,650.64 ns
623,493.03 ns
(93.85%)
1000-streams/each-1-bytes/simulated-time📈 view plot
🚷 view threshold
14,990,000,000.00 ns
(-0.01%)Baseline: 14,991,888,888.89 ns
15,010,601,829.65 ns
(99.86%)
1000-streams/each-1-bytes/wallclock-time📈 view plot
🚷 view threshold
13,757,000.00 ns
(-3.36%)Baseline: 14,235,181.29 ns
14,984,963.90 ns
(91.81%)
1000-streams/each-1000-bytes/simulated-time📈 view plot
🚷 view threshold
18,494,000,000.00 ns
(-2.16%)Baseline: 18,903,029,239.77 ns
19,156,891,572.29 ns
(96.54%)
1000-streams/each-1000-bytes/wallclock-time📈 view plot
🚷 view threshold
49,654,000.00 ns
(-5.27%)Baseline: 52,418,532.16 ns
58,797,744.06 ns
(84.45%)
RxStreamOrderer::inbound_frame()📈 view plot
🚷 view threshold
108,010,000.00 ns
(-1.69%)Baseline: 109,867,543.86 ns
112,039,815.51 ns
(96.40%)
coalesce_acked_from_zero 1+1 entries📈 view plot
🚷 view threshold
88.21 ns
(-0.48%)Baseline: 88.64 ns
89.32 ns
(98.76%)
coalesce_acked_from_zero 10+1 entries📈 view plot
🚷 view threshold
105.70 ns
(-0.39%)Baseline: 106.11 ns
107.10 ns
(98.69%)
coalesce_acked_from_zero 1000+1 entries📈 view plot
🚷 view threshold
89.09 ns
(-0.86%)Baseline: 89.86 ns
94.57 ns
(94.21%)
coalesce_acked_from_zero 3+1 entries📈 view plot
🚷 view threshold
106.31 ns
(-0.30%)Baseline: 106.63 ns
107.59 ns
(98.81%)
decode 1048576 bytes, mask 3f📈 view plot
🚷 view threshold
1,593,700.00 ns
(+0.07%)Baseline: 1,592,559.65 ns
1,599,660.71 ns
(99.63%)
decode 1048576 bytes, mask 7f📈 view plot
🚷 view threshold
5,051,400.00 ns
(-0.12%)Baseline: 5,057,223.98 ns
5,077,326.61 ns
(99.49%)
decode 1048576 bytes, mask ff📈 view plot
🚷 view threshold
🚨 view alert (🔔)
3,044,400.00 ns
(+0.42%)Baseline: 3,031,724.56 ns
3,043,954.93 ns
(100.01%)

decode 4096 bytes, mask 3f📈 view plot
🚷 view threshold
8,281.20 ns
(-0.19%)Baseline: 8,296.88 ns
8,345.07 ns
(99.23%)
decode 4096 bytes, mask 7f📈 view plot
🚷 view threshold
19,977.00 ns
(-0.15%)Baseline: 20,007.11 ns
20,086.25 ns
(99.46%)
decode 4096 bytes, mask ff📈 view plot
🚷 view threshold
11,654.00 ns
(-0.70%)Baseline: 11,735.81 ns
11,980.18 ns
(97.28%)
sent::Packets::take_ranges📈 view plot
🚷 view threshold
4,588.80 ns
(-3.37%)Baseline: 4,748.60 ns
4,992.88 ns
(91.91%)
transfer/pacing-false/same-seed/simulated-time/run📈 view plot
🚷 view threshold
25,492,000,000.00 ns
(+1.04%)Baseline: 25,229,952,662.72 ns
25,681,635,982.41 ns
(99.26%)
transfer/pacing-false/same-seed/wallclock-time/run📈 view plot
🚷 view threshold
25,093,000.00 ns
(-3.67%)Baseline: 26,048,130.18 ns
27,075,939.86 ns
(92.68%)
transfer/pacing-false/varying-seeds/simulated-time/run📈 view plot
🚷 view threshold
25,098,000,000.00 ns
(-0.27%)Baseline: 25,166,094,674.56 ns
25,212,596,997.08 ns
(99.55%)
transfer/pacing-false/varying-seeds/wallclock-time/run📈 view plot
🚷 view threshold
25,249,000.00 ns
(-4.02%)Baseline: 26,305,633.14 ns
27,566,364.23 ns
(91.59%)
transfer/pacing-true/same-seed/simulated-time/run📈 view plot
🚷 view threshold
25,131,000,000.00 ns
(-1.82%)Baseline: 25,597,136,094.67 ns
25,706,861,265.59 ns
(97.76%)
transfer/pacing-true/same-seed/wallclock-time/run📈 view plot
🚷 view threshold
26,147,000.00 ns
(-4.88%)Baseline: 27,489,781.07 ns
28,754,577.59 ns
(90.93%)
transfer/pacing-true/varying-seeds/simulated-time/run📈 view plot
🚷 view threshold
24,957,000,000.00 ns
(-0.15%)Baseline: 24,993,822,485.21 ns
25,044,357,554.26 ns
(99.65%)
transfer/pacing-true/varying-seeds/wallclock-time/run📈 view plot
🚷 view threshold
25,776,000.00 ns
(-3.96%)Baseline: 26,838,065.09 ns
28,130,574.71 ns
(91.63%)
🐰 View full continuous benchmarking report in Bencher

github-actions[bot] avatar Oct 13 '25 12:10 github-actions[bot]