neqo fix: No time-based loss detection before `largest

I will refactor this PR into smaller standalone ones.

#1603 broke this, by falsely enabling time-based loss detection before we have received a valid largest_acked.

6.1.2. Time Threshold

Once a later packet within the same packet number space has been acknowledged, an endpoint SHOULD declare an earlier packet lost if it was sent a threshold amount of time in the past.

This makes it clear that time-based LD should be disabled before an ACK is received.

A.10. Detecting Lost Packets

assert(largest_acked_packet[pn_space] != infinite)

The DetectAndRemoveLostPackets pseudo code also has the assert above.

When a PTO fires, we now explicitly also do a congestion reaction.

Fixes #1874

Also includes some fixes to hopefully make retransmissions during the handshake more robust, by timing out all packets in all packet number spaces affected by a PTO (making their contents eligible for RTX) and by not applying ack_delay to the application packet number space PTO during the handshake.

Jul 21 '24 23:07 larseggert

Failed Interop Tests

None :question:

All results

Succeeded Interop Tests

None :question:

Unsupported Interop Tests

None :question:

Jul 21 '24 23:07 github-actions[bot]

Benchmark results

Performance differences relative to d6279bf40c91c4cb45c358d5f614dc6190e4c9eb.

coalesce_acked_from_zero 1+1 entries: No change in performance detected.

       time:   [99.781 ns 100.06 ns 100.35 ns]
       change: [-0.0691% +0.3854% +0.8727%] (p = 0.13 > 0.05)
Found 12 outliers among 100 measurements (12.00%)
9 (9.00%) high mild
3 (3.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.

       time:   [117.84 ns 118.21 ns 118.61 ns]
       change: [-0.1498% +0.3222% +0.8142%] (p = 0.19 > 0.05)
Found 21 outliers among 100 measurements (21.00%)
3 (3.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
13 (13.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.

       time:   [117.66 ns 118.18 ns 118.78 ns]
       change: [-0.3301% +0.3180% +0.8978%] (p = 0.32 > 0.05)
Found 21 outliers among 100 measurements (21.00%)
6 (6.00%) low severe
2 (2.00%) low mild
3 (3.00%) high mild
10 (10.00%) high severe

coalesce_acked_from_zero 1000+1 entries: Change within noise threshold.

       time:   [98.191 ns 98.432 ns 98.741 ns]
       change: [+0.2702% +1.2029% +2.1014%] (p = 0.01 Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe

RxStreamOrderer::inbound_frame(): No change in performance detected.

       time:   [110.98 ms 111.11 ms 111.30 ms]
       change: [-0.1477% -0.0232% +0.1488%] (p = 0.82 > 0.05)
Found 11 outliers among 100 measurements (11.00%)
7 (7.00%) low mild
3 (3.00%) high mild
1 (1.00%) high severe

transfer/pacing-false/varying-seeds: No change in performance detected.

       time:   [28.531 ms 29.337 ms 30.147 ms]
       change: [-0.8630% +4.2908% +9.9250%] (p = 0.12 > 0.05)

transfer/pacing-true/varying-seeds: Change within noise threshold.

       time:   [37.814 ms 39.294 ms 40.754 ms]
       change: [+0.5884% +7.1476% +14.299%] (p = 0.04 Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

transfer/pacing-false/same-seed: :green_heart: Performance has improved.

       time:   [20.227 ms 21.095 ms 21.965 ms]
       change: [-23.657% -19.714% -15.468%] (p = 0.00

transfer/pacing-true/same-seed: :green_heart: Performance has improved.

       time:   [20.852 ms 21.936 ms 23.034 ms]
       change: [-52.505% -49.129% -45.393%] (p = 0.00

1-conn/1-100mb-resp (aka. Download)/client: No change in performance detected.

       time:   [113.70 ms 114.11 ms 114.50 ms]
       thrpt:  [873.36 MiB/s 876.33 MiB/s 879.50 MiB/s]
change:
       time:   [-0.7008% -0.2614% +0.1863%] (p = 0.26 > 0.05)
       thrpt:  [-0.1860% +0.2621% +0.7057%]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low severe
1 (1.00%) low mild

1-conn/10_000-parallel-1b-resp (aka. RPS)/client: No change in performance detected.

       time:   [312.75 ms 316.12 ms 319.49 ms]
       thrpt:  [31.300 Kelem/s 31.633 Kelem/s 31.974 Kelem/s]
change:
       time:   [-2.4277% -0.8752% +0.6564%] (p = 0.27 > 0.05)
       thrpt:  [-0.6521% +0.8830% +2.4881%]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/1-1b-resp (aka. HPS)/client: No change in performance detected.

       time:   [33.867 ms 34.070 ms 34.294 ms]
       thrpt:  [29.160  elem/s 29.351  elem/s 29.527  elem/s]
change:
       time:   [-1.5619% -0.6708% +0.2182%] (p = 0.13 > 0.05)
       thrpt:  [-0.2177% +0.6753% +1.5866%]
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) low mild
2 (2.00%) high mild
7 (7.00%) high severe

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client	Server	CC	Pacing	Mean [ms]	Min [ms]	Max [ms]	Relative
msquic	msquic			173.1 ± 109.5	100.9	564.9	1.00
neqo	msquic	reno	on	243.7 ± 61.5	212.3	434.9	1.00
neqo	msquic	reno		257.1 ± 67.1	218.3	428.7	1.00
neqo	msquic	cubic	on	242.9 ± 63.0	213.1	439.9	1.00
neqo	msquic	cubic		294.2 ± 98.2	217.5	455.0	1.00
msquic	neqo	reno	on	174.8 ± 101.8	81.9	336.3	1.00
msquic	neqo	reno		147.6 ± 94.0	84.3	360.6	1.00
msquic	neqo	cubic	on	139.7 ± 78.4	83.8	324.7	1.00
msquic	neqo	cubic		137.2 ± 89.0	80.5	373.3	1.00
neqo	neqo	reno	on	196.4 ± 71.6	129.8	366.3	1.00
neqo	neqo	reno		190.6 ± 95.6	125.1	434.5	1.00
neqo	neqo	cubic	on	196.8 ± 74.1	131.4	381.0	1.00
neqo	neqo	cubic		249.7 ± 111.5	133.1	471.2	1.00

:arrow_down: Download logs

Jul 22 '24 00:07 github-actions[bot]

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

Linux: ~~Debug~~ ~~Release~~
macOS: ~~Debug~~ ~~Release~~
Windows: ~~Debug~~ ~~Release~~

Jul 22 '24 00:07 github-actions[bot]

Codecov Report

Attention: Patch coverage is 99.08257% with 1 line in your changes missing coverage. Please review.

Please upload report for BASE (main@d6279bf). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
neqo-transport/src/path.rs	87.50%	1 Missing :warning:

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1998   +/-   ##
=======================================
  Coverage        ?   95.31%           
=======================================
  Files           ?      112           
  Lines           ?    36352           
  Branches        ?        0           
=======================================
  Hits            ?    34648           
  Misses          ?     1704           
  Partials        ?        0

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jul 22 '24 01:07 codecov[bot]

Kwik failure: https://github.com/ptrd/kwik/issues/46

Jul 22 '24 01:07 larseggert

Quinn failure: https://github.com/quinn-rs/quinn/issues/1929

Jul 22 '24 01:07 larseggert

Is there any point in undoing some more of the changes from #1603?

Jul 23 '24 01:07 larseggert

Still looking into the performance drop in "transfer/Run multiple transfers with varying seeds".

Jul 23 '24 13:07 larseggert

We should also look into why pacing is making things slower:

neqo 	neqo 	reno 	on 	415.4 ± 39.1 	375.7 	480.0 	1.00
neqo 	neqo 	reno 		206.9 ± 21.3 	181.8 	242.1 	1.00
neqo 	neqo 	cubic 	on 	450.3 ± 57.6 	385.6 	575.1 	1.00
neqo 	neqo 	cubic 		285.0 ± 168.1 	168.5 	610.7 	1.00

Jul 23 '24 15:07 larseggert

Maybe saving others some clicks, here are the numbers of a recent bench run on main:

neqo 	neqo 	reno 	on 	196.1 ± 19.4 	172.1 	236.5 	1.00
neqo 	neqo 	reno 		174.4 ± 10.3 	153.0 	195.1 	1.00
neqo 	neqo 	cubic 	on 	194.6 ± 27.0 	173.3 	275.2 	1.00
neqo 	neqo 	cubic 		175.7 ± 5.9 	168.8 	188.3 	1.00

https://github.com/mozilla/neqo/actions/runs/10051108975

Jul 23 '24 15:07 mxinden

Pacing will make things slower, but might make things nicer on the network. It should be a small delta, not a big one.

Jul 23 '24 16:07 martinthomson

So I bisected this, and b6e05e304b1ef582bc232710d3e1177fb5b371fe is the first bad commit (where performance of that benchmark drops).

There are really only two changes in there, one being the removal or the max_ack_delay addition from the loss_delay, the other being the fix to not do time-based loss recovery before having a largest_acked. I fail to see how either of these could cause this, however. Hm.

Jul 23 '24 23:07 larseggert

I validated that the key fix here (not doing time-based loss recovery while we don't have a largest_acked) causes the drop in benchmark performance. I can't quite figure out why that would be, given that we're now not over-aggressively retransmitting packets.

And when I added #1603, we didn't have the benchmarks yet (nor lots of other things), so it's not so straightforward to compare the pre-1603 performance to the one now.

Thoughts?

Jul 24 '24 01:07 larseggert

A guess @larseggert:

The previous version, time-based loss recovery while we don't have a largest_acked, leads to more packets being declared lost.
More packets being declared lost reduces the congestion window.
A small congestion window requires the Pacer to pace the packets over a smaller time window. (See correlation of pacer and congestion window.)
Thus smaller spaces between packets.
Thus higher throughput on a low-latency (localhost) connection.

Jul 24 '24 07:07 mxinden

A small congestion window requires the Pacer to pace the packets over a smaller time window. (See correlation of pacer and congestion window.)

Sorry for having been unclear. The performance reduction I was looking into was the drop in "transfer/Run multiple transfers with varying seeds" bench, not the drop related to pacing visible in the comparison table.

Jul 24 '24 15:07 larseggert

Both might have the same root cause @larseggert, no? "transfer/Run multiple transfers with varying seeds" uses the default ConnectionParameters. Thus, pacing is enabled.

https://github.com/mozilla/neqo/blob/608c0a72851a55f73dcfc9f206143748b36fd1bb/neqo-transport/src/connection/params.rs#L105

Also note that the "transfer/Run multiple transfers with varying seeds" benchmark includes packet loss.

https://github.com/mozilla/neqo/blob/608c0a72851a55f73dcfc9f206143748b36fd1bb/test-fixture/src/sim/taildrop.rs#L82-L87

Running this benchmark without pacing (e.g. via https://github.com/mozilla/neqo/pull/2006), increases performance, potentially back to the baseline.

Jul 24 '24 15:07 mxinden

You're right, it might be the same cause.

Jul 24 '24 15:07 larseggert

Looking at the first 50 pace.rs logs of the "transfer/Run multiple transfers with varying seeds" on main and on 0rtt-tmp, more specifically at the following lines.

https://github.com/mozilla/neqo/blob/88d3ed105db1b1a805a7b8c3b64078008069a403/neqo-transport/src/pace.rs#L77

https://github.com/mozilla/neqo/blob/88d3ed105db1b1a805a7b8c3b64078008069a403/neqo-transport/src/pace.rs#L87

main reduces its cwnd quickly to 6160, while 0rtt-tmp stays at 12320.
main does not pace (i.e. no wait) while 0rtt-tmp paces within the first 50 logs (i.e. wait).

`main`

➜  neqo git:(main) ✗ RUST_LOG=neqo_transport::pace=trace cargo test --bench transfer --features bench varying 2>&1 | head -n 50
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.06s
     Running benches/transfer.rs (target/debug/deps/transfer-7ba07c4aa3c126f9)
Testing transfer/Run multiple transfers with varying seeds
0s  0ms TRACE [Pacer 2464/1232] spend 1232 over 12320, 100ms
0s  0ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 561055565 }
0s  0ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 561055565 }
0s  0ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 561055565 }
0s  0ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 561055565 }
0s  2ms TRACE [Pacer 2464/1232] spend 671 over 12320, 100ms
0s  2ms TRACE [Pacer 1793/1232] spend 561 over 12320, 100ms
0s  2ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 561055565 }
0s  2ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 624566863 }
0s  2ms TRACE [Pacer 1232/1232] spend 1232 over 6160, 100ms
0s  2ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 624566863 }
0s  2ms TRACE [Pacer 2464/1232] next 6160/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 675862863 }
0s  2ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 624566863 }
0s  2ms TRACE [Pacer 2464/1232] next 6160/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 675862863 }
0s  2ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 624566863 }
0s  4ms TRACE [Pacer 2464/1232] spend 50 over 6160, 124.38313ms
0s  4ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 624566863 }
0s  4ms TRACE [Pacer 2464/1232] spend 81 over 6160, 124.38313ms
0s  4ms TRACE [Pacer 2383/1232] spend 234 over 6160, 124.38313ms
0s  4ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 624566863 }
0s  5ms TRACE [Pacer 2149/1232] next 6160/124.38313ms no wait = Instant { tv_sec: 12085, tv_nsec: 685438695 }
0s  5ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 624566863 }
0s  5ms TRACE [Pacer 2149/1232] next 6160/124.38313ms no wait = Instant { tv_sec: 12085, tv_nsec: 685438695 }
0s  5ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 624566863 }
0s  5ms TRACE [Pacer 2149/1232] next 6160/124.38313ms no wait = Instant { tv_sec: 12085, tv_nsec: 685438695 }
0s  5ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 624566863 }
0s  5ms TRACE [Pacer 2149/1232] next 6160/124.38313ms no wait = Instant { tv_sec: 12085, tv_nsec: 685438695 }
0s  5ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 624566863 }
0s  5ms TRACE [Pacer 1232/1232] spend 1232 over 12320, 100ms
0s  5ms TRACE [Pacer 2149/1232] next 6160/124.38313ms no wait = Instant { tv_sec: 12085, tv_nsec: 685438695 }
0s  5ms TRACE [Pacer 2464/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 733268221 }
0s  6ms TRACE [Pacer 2149/1232] next 6160/124.38313ms no wait = Instant { tv_sec: 12085, tv_nsec: 685438695 }
0s  6ms TRACE [Pacer 2464/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 733268221 }
0s  6ms TRACE [Pacer 2149/1232] next 6160/124.38313ms no wait = Instant { tv_sec: 12085, tv_nsec: 685438695 }
0s  6ms TRACE [Pacer 2464/1232] next 12320/100ms no wait = Instant { tv_sec: 12085, tv_nsec: 733268221 }
0s  6ms TRACE [Pacer 2149/1232] next 6160/124.38313ms no wait = Instant { tv_sec: 12085, tv_nsec: 685438695 }
0s  6ms TRACE [Pacer 2149/1232] next 6160/124.38313ms no wait = Instant { tv_sec: 12085, tv_nsec: 685438695 }
0s  6ms TRACE [Pacer 2149/1232] next 6160/124.38313ms no wait = Instant { tv_sec: 12085, tv_nsec: 685438695 }
0s  6ms TRACE [Pacer 2149/1232] next 6160/124.38313ms no wait = Instant { tv_sec: 12085, tv_nsec: 685438695 }
0s  6ms TRACE [Pacer 2149/1232] next 6160/124.38313ms no wait = Instant { tv_sec: 12085, tv_nsec: 685438695 }
0s  6ms TRACE [Pacer 2149/1232] next 6160/124.38313ms no wait = Instant { tv_sec: 12085, tv_nsec: 685438695 }
0s  7ms TRACE [Pacer 2149/1232] next 6160/124.38313ms no wait = Instant { tv_sec: 12085, tv_nsec: 685438695 }
0s  7ms TRACE [Pacer 2149/1232] next 6160/124.38313ms no wait = Instant { tv_sec: 12085, tv_nsec: 685438695 }
0s  7ms TRACE [Pacer 2149/1232] next 6160/124.38313ms no wait = Instant { tv_sec: 12085, tv_nsec: 685438695 }
0s  7ms TRACE [Pacer 2464/1232] spend 50 over 12320, 119.752079ms
0s  8ms TRACE [Pacer 2464/1232] spend 1282 over 12320, 119.752079ms
0s  8ms TRACE [Pacer 2149/1232] next 6160/124.38313ms no wait = Instant { tv_sec: 12085, tv_nsec: 685438695 }

`0rtt-tmp`

➜  neqo git:(0rtt-tmp) ✗ RUST_LOG=neqo_transport::pace=trace cargo test --bench transfer --features bench varying 2>&1 | head -n 50
   Compiling neqo-transport v0.8.1 (/home/mxinden/code/github.com/mozilla/neqo/neqo-transport)
   Compiling neqo-qpack v0.8.1 (/home/mxinden/code/github.com/mozilla/neqo/neqo-qpack)
   Compiling neqo-http3 v0.8.1 (/home/mxinden/code/github.com/mozilla/neqo/neqo-http3)
   Compiling test-fixture v0.8.1 (/home/mxinden/code/github.com/mozilla/neqo/test-fixture)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 2.85s
     Running benches/transfer.rs (target/debug/deps/transfer-7ba07c4aa3c126f9)
Testing transfer/Run multiple transfers with varying seeds
0s  0ms TRACE [Pacer 2464/1232] spend 1232 over 12320, 100ms
0s  0ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12154, tv_nsec: 788940966 }
0s  0ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12154, tv_nsec: 788940966 }
0s  0ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12154, tv_nsec: 788940966 }
0s  0ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12154, tv_nsec: 788940966 }
0s  2ms TRACE [Pacer 2464/1232] spend 664 over 12320, 100ms
0s  2ms TRACE [Pacer 1800/1232] spend 568 over 12320, 100ms
0s  2ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12154, tv_nsec: 788940966 }
0s  2ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12154, tv_nsec: 854132492 }
0s  2ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12154, tv_nsec: 788940966 }
0s  2ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12154, tv_nsec: 854132492 }
0s  2ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12154, tv_nsec: 788940966 }
0s  2ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12154, tv_nsec: 854132492 }
0s  2ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12154, tv_nsec: 788940966 }
0s  2ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12154, tv_nsec: 854132492 }
0s  3ms TRACE [Pacer 1232/1232] spend 51 over 12320, 118.702751ms
0s  4ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12154, tv_nsec: 854132492 }
0s  4ms TRACE [Pacer 2464/1232] spend 82 over 12320, 118.702751ms
0s  4ms TRACE [Pacer 2382/1232] spend 256 over 12320, 118.702751ms
0s  4ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12154, tv_nsec: 854132492 }
0s  4ms TRACE [Pacer 2126/1232] next 12320/118.702751ms no wait = Instant { tv_sec: 12154, tv_nsec: 907643717 }
0s  4ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12154, tv_nsec: 854132492 }
0s  4ms TRACE [Pacer 2126/1232] next 12320/118.702751ms no wait = Instant { tv_sec: 12154, tv_nsec: 907643717 }
0s  4ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12154, tv_nsec: 854132492 }
0s  4ms TRACE [Pacer 2126/1232] next 12320/118.702751ms no wait = Instant { tv_sec: 12154, tv_nsec: 907643717 }
0s  4ms TRACE [Pacer 1232/1232] next 12320/100ms no wait = Instant { tv_sec: 12154, tv_nsec: 854132492 }
0s  4ms TRACE [Pacer 2126/1232] next 12320/118.702751ms no wait = Instant { tv_sec: 12154, tv_nsec: 907643717 }
0s  5ms TRACE [Pacer 2126/1232] next 12320/118.702751ms no wait = Instant { tv_sec: 12154, tv_nsec: 907643717 }
0s  5ms TRACE [Pacer 2126/1232] next 12320/118.702751ms no wait = Instant { tv_sec: 12154, tv_nsec: 907643717 }
0s  5ms TRACE [Pacer 2126/1232] next 12320/118.702751ms no wait = Instant { tv_sec: 12154, tv_nsec: 907643717 }
0s  5ms TRACE [Pacer 2126/1232] next 12320/118.702751ms no wait = Instant { tv_sec: 12154, tv_nsec: 907643717 }
0s  5ms TRACE [Pacer 1232/1232] spend 51 over 12320, 112.760801ms
0s  6ms TRACE [Pacer 2464/1232] spend 1281 over 12320, 112.760801ms
0s  6ms TRACE [Pacer 2126/1232] next 12320/118.702751ms no wait = Instant { tv_sec: 12154, tv_nsec: 907643717 }
0s  6ms TRACE [Pacer 1183/1232] next 12320/112.760801ms wait 224.24µs = Instant { tv_sec: 12155, tv_nsec: 19325906 }
0s  6ms TRACE [Pacer 1183/1232] next 12320/112.760801ms wait 224.24µs = Instant { tv_sec: 12155, tv_nsec: 19325906 }
0s  6ms TRACE [Pacer 2126/1232] next 12320/118.702751ms no wait = Instant { tv_sec: 12154, tv_nsec: 907643717 }
0s  6ms TRACE [Pacer 1183/1232] next 12320/112.760801ms wait 224.24µs = Instant { tv_sec: 12155, tv_nsec: 19325906 }
0s  7ms TRACE [Pacer 1183/1232] spend 1332 over 12320, 112.760801ms
0s  7ms TRACE [Pacer 2126/1232] next 12320/118.702751ms no wait = Instant { tv_sec: 12154, tv_nsec: 907643717 }
0s  7ms TRACE [Pacer 0/1232] next 12320/112.760801ms wait 5.63804ms = Instant { tv_sec: 12155, tv_nsec: 24963946 }
0s  7ms TRACE [Pacer 0/1232] next 12320/112.760801ms wait 5.63804ms = Instant { tv_sec: 12155, tv_nsec: 24963946 }
0s  7ms TRACE [Pacer 2126/1232] next 12320/118.702751ms no wait = Instant { tv_sec: 12154, tv_nsec: 907643717 }

Jul 24 '24 16:07 mxinden

main reduces its cwnd quickly to 6160, while 0rtt-tmp stays at 12320.

That is probably because main erroneously marks 0-RTT as lost, and does a congestion reaction.

main does not pace (i.e. no wait) while 0rtt-tmp paces within the first 50 logs (i.e. wait).

It might be that because main is in the recovery state, it can immediately (= not paced) retransmit a single packet.

In other words, the bug in main might make it over-aggressive, and 0rtt-tmp behaves (more) correctly.

Jul 24 '24 17:07 larseggert

main does not pace (i.e. no wait) while 0rtt-tmp paces within the first 50 logs (i.e. wait).

It might be that because main is in the recovery state, it can immediately (= not paced) retransmit a single packet.

Only when we send a packet < mtu and we are in RecoveryStart do we ignore the Pacer:

https://github.com/mozilla/neqo/blob/b63e0ff567342a220a9633d1c3ff2b15706bf5bb/neqo-transport/src/recovery/mod.rs#L914-L930

Would that explain the near 100% performance regression on the client/server transfers with pacing enabled?

We should also look into why pacing is making things slower:

neqo 	neqo 	reno 	on 	415.4 ± 39.1 	375.7 	480.0 	1.00
neqo 	neqo 	reno 		206.9 ± 21.3 	181.8 	242.1 	1.00
neqo 	neqo 	cubic 	on 	450.3 ± 57.6 	385.6 	575.1 	1.00
neqo 	neqo 	cubic 		285.0 ± 168.1 	168.5 	610.7 	1.00

I am still puzzled about these benchmark regressions. I looked into it some more, but without success. I don't think my above guesses are valid.

@larseggert do you have further ideas worth investigating?

I am fine merging here even with the regression. I think bugfixes should take precedence over performance. That said, I would like to investigate further and ideally not uplift to Firefox before we found the reason.

I will take a look at qlogs next. I can reproduce the regression with the Simulator, which should allow me to generate qlogs without impacting run time, given that time is simulated.

Jul 31 '24 09:07 mxinden

I wet back to fbc5d62697c968dec968941d4c495f4fa6869c9d, which was the state of the code before #1603 was merged. I needed to add the ability to turn off pacing on the server, and comment out some noisy eprintln! stuff. I then did a hyperfine run, comparing our client and server with Reno, with and without pacing.

Without pacing:

  Time (mean ± σ):     471.5 ms ±  16.8 ms    [User: 305.3 ms, System: 114.1 ms]
  Range (min … max):   430.3 ms … 485.9 ms    10 runs

With pacing:

  Time (mean ± σ):     523.7 ms ±  14.0 ms    [User: 299.6 ms, System: 120.2 ms]
  Range (min … max):   495.8 ms … 546.4 ms    10 runs

In other words, we did see a similar performance difference before #1603 as we are seeing now.

However, we also see this performance difference with #1603. I'm now bisecting to see when that changed.

Jul 31 '24 15:07 larseggert

I did some comparison runs between this PR and HEAD. It seems that the performance difference between pacing and non-pacing is only present if the loopback MTU is 64KB.

This PR, MTU 1500:

Benchmark 1: target/release/neqo-client --cc reno  --output-dir /tmp/tmp.w6tjsiaAdZ -o -a hq-interop https://127.0.0.1:4434/134217728
  Time (mean ± σ):     552.8 ms ±  67.7 ms    [User: 290.3 ms, System: 184.4 ms]
  Range (min … max):   495.9 ms … 958.6 ms    100 runs

Benchmark 1: target/release/neqo-client --cc reno --no-pacing --output-dir /tmp/tmp.w6tjsiaAdZ -o -a hq-interop https://127.0.0.1:4434/134217728
  Time (mean ± σ):     552.9 ms ±  47.8 ms    [User: 285.3 ms, System: 182.3 ms]
  Range (min … max):   490.8 ms … 772.2 ms    100 runs

This PR, MTU 64KB:

Benchmark 1: target/release/neqo-client --cc reno  --output-dir /tmp/tmp.gKIfqAvJbs -o -a hq-interop https://127.0.0.1:4434/134217728
  Time (mean ± σ):     397.0 ms ± 121.4 ms    [User: 149.0 ms, System: 57.5 ms]
  Range (min … max):   227.4 ms … 692.2 ms    100 runs

Benchmark 1: target/release/neqo-client --cc reno --no-pacing --output-dir /tmp/tmp.gKIfqAvJbs -o -a hq-interop https://127.0.0.1:4434/134217728
  Time (mean ± σ):     262.3 ms ±  18.2 ms    [User: 148.3 ms, System: 56.8 ms]
  Range (min … max):   232.2 ms … 314.3 ms    100 runs

HEAD, MTU 1500:

Benchmark 1: target/release/neqo-client --cc reno  --output-dir /tmp/tmp.GHDuGPjylE -o -a hq-interop https://127.0.0.1:4434/134217728
  Time (mean ± σ):     578.8 ms ±  48.4 ms    [User: 311.1 ms, System: 211.9 ms]
  Range (min … max):   520.0 ms … 766.4 ms    100 runs

Benchmark 1: target/release/neqo-client --cc reno --no-pacing --output-dir /tmp/tmp.GHDuGPjylE -o -a hq-interop https://127.0.0.1:4434/134217728
  Time (mean ± σ):     577.3 ms ±  39.5 ms    [User: 327.2 ms, System: 213.6 ms]
  Range (min … max):   540.2 ms … 789.1 ms    100 runs

HEAD, MTU 64KB:

Benchmark 1: target/release/neqo-client --cc reno  --output-dir /tmp/tmp.6wBXumkvZX -o -a hq-interop https://127.0.0.1:4434/134217728
  Time (mean ± σ):     256.8 ms ±  34.3 ms    [User: 144.6 ms, System: 52.7 ms]
  Range (min … max):   221.5 ms … 501.1 ms    100 runs

Benchmark 1: target/release/neqo-client --cc reno --no-pacing --output-dir /tmp/tmp.6wBXumkvZX -o -a hq-interop https://127.0.0.1:4434/134217728
  Time (mean ± σ):     245.1 ms ±  23.4 ms    [User: 142.5 ms, System: 52.7 ms]
  Range (min … max):   220.5 ms … 432.7 ms    100 runs

Aug 01 '24 08:08 larseggert

I wouldn't read too much into those differences. A larger MTU is going to mean that overheads are more pronounced.

Also, pacing is always going to be slower in this sort of test. Pacing is designed to be gentle on queues in the network at the cost of extra CPU and delay at the endpoint that paces. As long as the cost is small, we're good.

Aug 01 '24 08:08 martinthomson

On further investigation, this seems to be related to what PACER_SPEEDUP is set to. When I double it to 4, the difference for this PR with a 64KB goes away:

Benchmark 1: target/release/neqo-client --cc reno  --output-dir /tmp/tmp.AP82nSCWV4 -o -a hq-interop https://127.0.0.1:4434/134217728
  Time (mean ± σ):     268.4 ms ±  36.8 ms    [User: 155.3 ms, System: 53.1 ms]
  Range (min … max):   230.9 ms … 514.0 ms    100 runs

Benchmark 1: target/release/neqo-client --cc reno --no-pacing --output-dir /tmp/tmp.AP82nSCWV4 -o -a hq-interop https://127.0.0.1:4434/134217728
  Time (mean ± σ):     264.8 ms ±  36.1 ms    [User: 158.5 ms, System: 58.4 ms]
  Range (min … max):   228.5 ms … 503.1 ms    100 runs

Given all this, I think we can merge this.

Aug 01 '24 09:08 larseggert

This isn't quite ready yet, because in the process of fixing the test failure after merging HEAD, I realized we have another bug: When there is packet loss during the handshake, we do not always ACK when we should be ACK'ing. That's because when we send an ACK, we reset the ACK timer (even during the handshake) and that causes ACKs to be not sent when they should be (on retransmit).

I'm working on a fix, and want to see that merged before continuing on this one, because that fix might impact what needs to change in this PR.

Aug 01 '24 13:08 larseggert

That other PR is #2027.

Aug 02 '24 11:08 larseggert

fix: No time-based loss detection before `largest_acked` & fixes

Failed Interop Tests

Succeeded Interop Tests

Unsupported Interop Tests

Benchmark results

Client/server transfer results

Firefox builds for this PR

Codecov Report

main

0rtt-tmp

`main`

`0rtt-tmp`