perf: use recvmmsg in addition to GRO
Previously we would only do GRO.
Depends on https://github.com/mozilla/neqo/pull/2093. Part of https://github.com/mozilla/neqo/issues/1693.
Draft only. Just testing on the benchmark runner for now.
Failed Interop Tests
QUIC Interop Runner, client vs. server
neqo-latest as client
- neqo-latest vs. aioquic: Z
- neqo-latest vs. msquic: Z A L1
- neqo-latest vs. mvfst: A L1 C1
- neqo-latest vs. nginx: L1
- neqo-latest vs. picoquic: L1
- neqo-latest vs. quiche: C1
- neqo-latest vs. xquic: A
neqo-latest as server
- chrome vs. neqo-latest: 3
- lsquic vs. neqo-latest: run cancelled after 20 min
- msquic vs. neqo-latest: U
- mvfst vs. neqo-latest: Z A L1 C1
- quinn vs. neqo-latest: L1 V2
- xquic vs. neqo-latest: M
All results
Succeeded Interop Tests
QUIC Interop Runner, client vs. server
neqo-latest as client
- neqo-latest vs. aioquic: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 V2
- neqo-latest vs. go-x-net: H DC LR M B U A L2 C2 6
- neqo-latest vs. haproxy: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
- neqo-latest vs. kwik: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
- neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- neqo-latest vs. msquic: H DC LR C20 M S R B U L2 C1 C2 6 V2
- neqo-latest vs. mvfst: H DC LR M R Z 3 B U L2 C2 6
- neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L2 C1 C2 6
- neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- neqo-latest vs. picoquic: H DC LR C20 M S R Z 3 B U E A L2 C1 C2 6 V2
- neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
- neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C2 6
- neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6
- neqo-latest vs. s2n-quic: H DC LR C20 M S R 3 B U E A L1 L2 C1 C2 6
- neqo-latest vs. xquic: H DC LR C20 M R Z 3 B U L1 L2 C1 C2 6
neqo-latest as server
- aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B A L1 L2 C1 C2 6 V2
- go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6
- kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
- msquic vs. neqo-latest: H DC LR C20 M S R Z B A L1 L2 C1 C2 6 V2
- mvfst vs. neqo-latest: H DC LR M 3 B L2 C2 6
- neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- ngtcp2 vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- quic-go vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
- quiche vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 C1 C2 6
- quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L2 C1 C2 6
- s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6
- xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6
Unsupported Interop Tests
QUIC Interop Runner, client vs. server
neqo-latest as client
- neqo-latest vs. aioquic: E
- neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2
- neqo-latest vs. haproxy: E
- neqo-latest vs. kwik: E
- neqo-latest vs. msquic: 3 E
- neqo-latest vs. mvfst: C20 S E V2
- neqo-latest vs. nginx: E V2
- neqo-latest vs. quic-go: E V2
- neqo-latest vs. quiche: E V2
- neqo-latest vs. quinn: V2
- neqo-latest vs. s2n-quic: Z V2
- neqo-latest vs. xquic: S E V2
neqo-latest as server
- aioquic vs. neqo-latest: U E
- chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2
- go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
- kwik vs. neqo-latest: E
- msquic vs. neqo-latest: 3 E
- mvfst vs. neqo-latest: C20 S R U E V2
- quic-go vs. neqo-latest: E V2
- quiche vs. neqo-latest: C20 U E V2
- s2n-quic vs. neqo-latest: C20 Z U V2
- xquic vs. neqo-latest: E V2
Benchmark results
Performance differences relative to 55e3a9363c28632dfb29ce91c7712cab1f6a58da.
coalesce_acked_from_zero 1+1 entries: :green_heart: Performance has improved.
time: [98.803 ns 99.130 ns 99.472 ns]
change: [-12.775% -12.368% -11.953%] (p = 0.00 Found 15 outliers among 100 measurements (15.00%)
11 (11.00%) high mild
4 (4.00%) high severecoalesce_acked_from_zero 3+1 entries: :green_heart: Performance has improved.
time: [116.87 ns 117.20 ns 117.56 ns]
change: [-33.461% -33.093% -32.696%] (p = 0.00 Found 19 outliers among 100 measurements (19.00%)
2 (2.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
12 (12.00%) high severecoalesce_acked_from_zero 10+1 entries: :green_heart: Performance has improved.
time: [116.24 ns 116.64 ns 117.13 ns]
change: [-39.896% -35.657% -33.117%] (p = 0.00 Found 13 outliers among 100 measurements (13.00%)
4 (4.00%) low severe
3 (3.00%) high mild
6 (6.00%) high severecoalesce_acked_from_zero 1000+1 entries: :green_heart: Performance has improved.
time: [97.407 ns 97.529 ns 97.668 ns]
change: [-31.875% -31.315% -30.596%] (p = 0.00 Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severeRxStreamOrderer::inbound_frame(): Change within noise threshold.
time: [111.62 ms 111.67 ms 111.72 ms]
change: [+0.1861% +0.2531% +0.3192%] (p = 0.00 Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) low mild
1 (1.00%) high mildtransfer/pacing-false/varying-seeds: No change in performance detected.
time: [26.891 ms 27.987 ms 29.092 ms]
change: [-7.3239% -2.0801% +3.3604%] (p = 0.45 > 0.05)
transfer/pacing-true/varying-seeds: No change in performance detected.
time: [36.651 ms 38.242 ms 39.826 ms]
change: [-5.9905% -0.1635% +5.8478%] (p = 0.95 > 0.05)
transfer/pacing-false/same-seed: No change in performance detected.
time: [26.723 ms 27.507 ms 28.285 ms]
change: [-3.3295% +0.5804% +4.8142%] (p = 0.78 > 0.05)
transfer/pacing-true/same-seed: No change in performance detected.
time: [41.198 ms 43.275 ms 45.393 ms]
change: [-7.2694% -1.0401% +5.9147%] (p = 0.75 > 0.05)
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
1-conn/1-100mb-resp (aka. Download)/client: :green_heart: Performance has improved.
time: [106.99 ms 107.48 ms 108.11 ms]
thrpt: [924.95 MiB/s 930.44 MiB/s 934.71 MiB/s]
change:
time: [-7.7356% -7.2870% -6.6739%] (p = 0.00 +7.8598% +8.3842%]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high severe
1-conn/10_000-parallel-1b-resp (aka. RPS)/client: No change in performance detected.
time: [319.80 ms 323.00 ms 326.21 ms]
thrpt: [30.655 Kelem/s 30.960 Kelem/s 31.270 Kelem/s]
change:
time: [-0.7331% +0.8246% +2.4125%] (p = 0.31 > 0.05)
thrpt: [-2.3557% -0.8178% +0.7385%]
1-conn/1-1b-resp (aka. HPS)/client: :broken_heart: Performance has regressed.
time: [36.454 ms 36.648 ms 36.855 ms]
thrpt: [27.134 elem/s 27.286 elem/s 27.432 elem/s]
change:
time: [+7.7777% +8.6189% +9.3890%] (p = 0.00 -7.9350% -7.2165%]
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
Client/server transfer results
Transfer of 33554432 bytes over loopback.
| Client | Server | CC | Pacing | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|---|---|---|
| msquic | msquic | 174.2 ± 91.5 | 100.2 | 414.4 | 1.00 | ||
| neqo | msquic | reno | on | 211.1 ± 10.9 | 192.1 | 224.9 | 1.00 |
| neqo | msquic | reno | 220.7 ± 20.5 | 200.7 | 265.3 | 1.00 | |
| neqo | msquic | cubic | on | 208.0 ± 14.7 | 191.8 | 234.0 | 1.00 |
| neqo | msquic | cubic | 226.8 ± 44.1 | 192.9 | 363.9 | 1.00 | |
| msquic | neqo | reno | on | 126.1 ± 73.5 | 83.7 | 328.4 | 1.00 |
| msquic | neqo | reno | 129.7 ± 91.4 | 84.0 | 455.9 | 1.00 | |
| msquic | neqo | cubic | on | 131.8 ± 81.8 | 82.6 | 336.3 | 1.00 |
| msquic | neqo | cubic | 103.9 ± 48.7 | 81.9 | 321.9 | 1.00 | |
| neqo | neqo | reno | on | 125.1 ± 11.7 | 107.7 | 146.4 | 1.00 |
| neqo | neqo | reno | 183.7 ± 137.4 | 106.9 | 687.8 | 1.00 | |
| neqo | neqo | cubic | on | 185.2 ± 91.9 | 101.0 | 360.2 | 1.00 |
| neqo | neqo | cubic | 127.9 ± 20.0 | 103.5 | 172.4 | 1.00 |
recvmmsg is still worth doing, at least on MacOS with https://github.com/quinn-rs/quinn/pull/1993. That said, likely easiest to start from scratch.
Work tracked in https://github.com/mozilla/neqo/issues/1693. Closing here.