refactor(bin): use 32k stream IO buffer
Firefox by default uses a 32k IO buffer for streams.
https://searchfox.org/mozilla-central/rev/f6e3b81aac49e602f06c204f9278da30993cdc8a/modules/libpref/init/all.js#3212
This commit makes neqo-bin use the same buffer size across http09/3 and client/server.
Along the way it consolidates various buffer logic and reuses buffers whereever feasible.
Failed Interop Tests
QUIC Interop Runner, client vs. server
neqo-latest as client
- neqo-latest vs. aioquic: Z
- neqo-latest vs. go-x-net: H
- neqo-latest vs. kwik: Z L1
- neqo-latest vs. lsquic: C1
- neqo-latest vs. msquic: Z A L1
- neqo-latest vs. mvfst: A L1 C1
- neqo-latest vs. nginx: L1 C1
- neqo-latest vs. ngtcp2: C1
- neqo-latest vs. picoquic: L1 C1
- neqo-latest vs. quinn: E
- neqo-latest vs. s2n-quic: R
- neqo-latest vs. xquic: A L1 C1
neqo-latest as server
- aioquic vs. neqo-latest: C1
- lsquic vs. neqo-latest: run cancelled after 20 min
- mvfst vs. neqo-latest: Z 3 A L1 C1
- ngtcp2 vs. neqo-latest: run cancelled after 20 min
- quiche vs. neqo-latest: 3
- quinn vs. neqo-latest: V2
- xquic vs. neqo-latest: M
All results
Succeeded Interop Tests
QUIC Interop Runner, client vs. server
neqo-latest as client
- neqo-latest vs. aioquic: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 V2
- neqo-latest vs. go-x-net: DC LR M B U A L2 C2 6
- neqo-latest vs. haproxy: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
- neqo-latest vs. kwik: H DC LR C20 M S R 3 B U A L2 C1 C2 6 V2
- neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C2 6 V2
- neqo-latest vs. msquic: H DC LR C20 M S R B U L2 C1 C2 6 V2
- neqo-latest vs. mvfst: H DC LR M R Z 3 B U L2 C2 6
- neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L2 C2 6
- neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U E A L1 L2 C2 6 V2
- neqo-latest vs. picoquic: H DC LR C20 M S R Z 3 B U E A L2 C2 6 V2
- neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
- neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
- neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
- neqo-latest vs. s2n-quic: H DC LR C20 M S 3 B U E A L1 L2 C1 C2 6
- neqo-latest vs. xquic: H DC LR C20 M R Z 3 B U L2 C2 6
neqo-latest as server
- aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B A L1 L2 C2 6 V2
- chrome vs. neqo-latest: 3
- go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6
- kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
- msquic vs. neqo-latest: H DC LR C20 M S R Z B U A L1 L2 C1 C2 6 V2
- mvfst vs. neqo-latest: H DC LR M B L2 C2 6
- neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- quic-go vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
- quiche vs. neqo-latest: H DC LR M S R Z B A L1 L2 C1 C2 6
- quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6
- s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6
- xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6
Unsupported Interop Tests
QUIC Interop Runner, client vs. server
neqo-latest as client
- neqo-latest vs. aioquic: E
- neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2
- neqo-latest vs. haproxy: E
- neqo-latest vs. kwik: E
- neqo-latest vs. msquic: 3 E
- neqo-latest vs. mvfst: C20 S E V2
- neqo-latest vs. nginx: E V2
- neqo-latest vs. quic-go: E V2
- neqo-latest vs. quiche: E V2
- neqo-latest vs. quinn: V2
- neqo-latest vs. s2n-quic: Z V2
- neqo-latest vs. xquic: S E V2
neqo-latest as server
- aioquic vs. neqo-latest: U E
- chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2
- go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
- kwik vs. neqo-latest: E
- msquic vs. neqo-latest: 3 E
- mvfst vs. neqo-latest: C20 S R U E V2
- quic-go vs. neqo-latest: E V2
- quiche vs. neqo-latest: C20 U E V2
- s2n-quic vs. neqo-latest: C20 Z U V2
- xquic vs. neqo-latest: E V2
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 95.36%. Comparing base (
f801c29) to head (bad3673).
Additional details and impacted files
@@ Coverage Diff @@
## main #2008 +/- ##
==========================================
- Coverage 95.36% 95.36% -0.01%
==========================================
Files 112 112
Lines 36475 36475
==========================================
- Hits 34784 34783 -1
- Misses 1691 1692 +1
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Benchmark results
Performance differences relative to f801c295df38b3aebe0b481ed5702218663b4308.
coalesce_acked_from_zero 1+1 entries: :broken_heart: Performance has regressed.
time: [196.61 ns 197.05 ns 197.53 ns]
change: [+1.4739% +1.8071% +2.1494%] (p = 0.00 Found 16 outliers among 100 measurements (16.00%)
1 (1.00%) low mild
6 (6.00%) high mild
9 (9.00%) high severecoalesce_acked_from_zero 3+1 entries: Change within noise threshold.
time: [237.21 ns 237.92 ns 238.71 ns]
change: [+0.0346% +0.3689% +0.7299%] (p = 0.04 Found 16 outliers among 100 measurements (16.00%)
16 (16.00%) high severecoalesce_acked_from_zero 10+1 entries: No change in performance detected.
time: [236.98 ns 238.05 ns 239.36 ns]
change: [-0.1969% +0.3499% +0.9639%] (p = 0.23 > 0.05)
Found 9 outliers among 100 measurements (9.00%)
9 (9.00%) high severe
coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
time: [215.66 ns 215.88 ns 216.12 ns]
change: [-0.0640% +0.7259% +1.4315%] (p = 0.06 > 0.05)
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
RxStreamOrderer::inbound_frame(): Change within noise threshold.
time: [119.60 ms 119.68 ms 119.76 ms]
change: [+0.8078% +0.9063% +1.0011%] (p = 0.00 Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mildtransfer/pacing-false/varying-seeds: No change in performance detected.
time: [40.423 ms 42.503 ms 44.609 ms]
change: [-5.6161% +0.7236% +7.8640%] (p = 0.83 > 0.05)
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
2 (2.00%) high mild
transfer/pacing-true/varying-seeds: Change within noise threshold.
time: [57.763 ms 60.818 ms 63.932 ms]
change: [+2.2889% +10.117% +18.409%] (p = 0.01 Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mildtransfer/pacing-false/same-seed: No change in performance detected.
time: [47.965 ms 49.768 ms 51.492 ms]
change: [-6.1545% -1.5034% +3.4935%] (p = 0.55 > 0.05)
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild
transfer/pacing-true/same-seed: No change in performance detected.
time: [60.923 ms 67.805 ms 74.674 ms]
change: [-18.847% -6.3929% +7.1145%] (p = 0.35 > 0.05)
1-conn/1-100mb-resp (aka. Download)/client: :green_heart: Performance has improved.
time: [126.55 ms 127.76 ms 129.52 ms]
thrpt: [772.10 MiB/s 782.71 MiB/s 790.18 MiB/s]
change:
time: [-27.403% -26.268% -24.915%] (p = 0.00 +35.626% +37.747%]
Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) low mild
1 (1.00%) high severe
1-conn/10_000-parallel-1b-resp (aka. RPS)/client: No change in performance detected.
time: [408.49 ms 411.58 ms 414.65 ms]
thrpt: [24.117 Kelem/s 24.296 Kelem/s 24.481 Kelem/s]
change:
time: [-0.9293% +0.2571% +1.4515%] (p = 0.66 > 0.05)
thrpt: [-1.4308% -0.2564% +0.9380%]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
1-conn/1-1b-resp (aka. HPS)/client: No change in performance detected.
time: [46.648 ms 47.456 ms 48.297 ms]
thrpt: [20.705 elem/s 21.072 elem/s 21.437 elem/s]
change:
time: [-1.6077% +0.7764% +3.0788%] (p = 0.52 > 0.05)
thrpt: [-2.9869% -0.7704% +1.6340%]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
Client/server transfer results
Transfer of 33554432 bytes over loopback.
| Client | Server | CC | Pacing | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|---|---|---|
| msquic | msquic | 198.2 ± 84.1 | 98.9 | 322.2 | 1.00 | ||
| neqo | msquic | reno | on | 298.3 ± 71.2 | 259.3 | 466.2 | 1.00 |
| neqo | msquic | reno | 279.3 ± 25.0 | 247.5 | 332.7 | 1.00 | |
| neqo | msquic | cubic | on | 315.0 ± 106.3 | 255.0 | 597.3 | 1.00 |
| neqo | msquic | cubic | 281.3 ± 42.2 | 250.4 | 398.3 | 1.00 | |
| msquic | neqo | reno | on | 167.5 ± 108.4 | 85.6 | 530.2 | 1.00 |
| msquic | neqo | reno | 157.2 ± 91.3 | 84.4 | 359.1 | 1.00 | |
| msquic | neqo | cubic | on | 135.0 ± 96.7 | 84.5 | 458.3 | 1.00 |
| msquic | neqo | cubic | 167.8 ± 96.0 | 85.4 | 346.6 | 1.00 | |
| neqo | neqo | reno | on | 192.0 ± 66.5 | 151.9 | 363.2 | 1.00 |
| neqo | neqo | reno | 190.8 ± 68.6 | 151.6 | 407.3 | 1.00 | |
| neqo | neqo | cubic | on | 258.4 ± 135.3 | 159.7 | 596.3 | 1.00 |
| neqo | neqo | cubic | 311.5 ± 175.9 | 173.7 | 664.5 | 1.00 |
Firefox builds for this PR
The following builds are available for testing. Crossed-out builds did not succeed.
- Linux: Debug Release
- macOS: Debug Release
- Windows: Debug Release
Strange. "No change in performance detected", but a 97% increase in runtime. Will look into it now.
1-conn/1-100mb-resp (aka. Download)/client: No change in performance detected.
time: [358.86 ms 589.33 ms 967.64 ms]
thrpt: [103.34 MiB/s 169.68 MiB/s 278.66 MiB/s]
change:
time: [+21.333% +97.949% +256.03%] (p = 0.08 > 0.05)
thrpt: [-71.912% -49.482% -17.582%]
No change
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high severe
Still debugging the performance regression. I expected a performance improvement.
Interesting difference in --stats of a ~1GB download with rx: 819060 on main and rx: 25662 on 32k.
branch main
9s627ms INFO stats for Client ...
rx: 819060 drop 1 dup 0 saved 1
tx: 119448 lost 14789 lateack 4 ptoack 28
pmtud: 8 sent 6 acked 0 lost 0 change
resumed: false
frames rx:
crypto 6 done 5 token 4 close 0
ack 2449 (max 119418) ping 4 padding 2666
stream 817514 reset 0 stop 0
max: stream 0 data 0 stream_data 0
blocked: stream 0 data 0 stream_data 1299
datagram 0
ncid 35 rcid 0 pchallenge 0 presponse 0
ack_frequency 10
frames tx:
crypto 3 done 0 token 0 close 3
ack 117360 (max 820038) ping 46 padding 8
stream 8 reset 0 stop 0
max: stream 0 data 0 stream_data 2521
blocked: stream 0 data 0 stream_data 0
datagram 0
ncid 0 rcid 0 pchallenge 0 presponse 0
ack_frequency 2
branch 32k
3s488ms INFO stats for Client ...
rx: 25662 drop 1 dup 0 saved 1
tx: 15781 lost 0 lateack 0 ptoack 187
pmtud: 10 sent 10 acked 0 lost 0 change
resumed: false
frames rx:
crypto 3 done 1 token 1 close 0
ack 2811 (max 15765) ping 78 padding 375850
stream 24043 reset 0 stop 0
max: stream 0 data 0 stream_data 0
blocked: stream 0 data 0 stream_data 2670
datagram 0
ncid 7 rcid 0 pchallenge 0 presponse 0
ack_frequency 3
frames tx:
crypto 2 done 0 token 0 close 3
ack 13998 (max 26763) ping 149 padding 10
stream 5 reset 0 stop 0
max: stream 0 data 0 stream_data 2764
blocked: stream 0 data 0 stream_data 0
datagram 0
ncid 0 rcid 0 pchallenge 0 presponse 0
ack_frequency 1
Without pacing, this pull request outperforms main:
➜ neqo-bin git:(main) ✗ critcmp main main-no-pacing 32k 32k-no-pacing -f "Download" --list
1-conn/1-100mb-resp (aka. Download)/client
------------------------------------------
32k-no-pacing 1.00 114.8±49.78ms 871.3 MB/sec
main-no-pacing 1.05 121.0±83.97ms 826.4 MB/sec
main 1.45 165.9±56.69ms 602.8 MB/sec
32k 11.39 1307.8±1038.97ms 76.5 MB/sec
This is ready for a review. https://github.com/mozilla/neqo/pull/2035 resolved the performance regressions. In addition, as expected, this now improves the Download benchmark:
1-conn/1-100mb-resp (aka. Download)/client: 💚 Performance has improved.
time: [126.55 ms 127.76 ms 129.52 ms] thrpt: [772.10 MiB/s 782.71 MiB/s 790.18 MiB/s]change: time: [-27.403% -26.268% -24.915%] (p = 0.00 < 0.05) thrpt: [+33.182% +35.626% +37.747%]
Note: The benefit of this pull request is not the performance improvement above, but the fact that the benchmark now executes closer to how Firefox uses neqo-*.