neqo perf: use recvmmsg in addition to GRO

Previously we would only do GRO.

Depends on https://github.com/mozilla/neqo/pull/2093. Part of https://github.com/mozilla/neqo/issues/1693.

Draft only. Just testing on the benchmark runner for now.

Sep 28 '24 08:09 mxinden

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

chrome vs. neqo-latest: 3
lsquic vs. neqo-latest: run cancelled after 20 min
msquic vs. neqo-latest: U
mvfst vs. neqo-latest: Z A L1 C1
quinn vs. neqo-latest: L1 V2
xquic vs. neqo-latest: M

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. go-x-net: H DC LR M B U A L2 C2 6
neqo-latest vs. haproxy: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. kwik: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. msquic: H DC LR C20 M S R B U L2 C1 C2 6 V2
neqo-latest vs. mvfst: H DC LR M R Z 3 B U L2 C2 6
neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L2 C1 C2 6
neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. picoquic: H DC LR C20 M S R Z 3 B U E A L2 C1 C2 6 V2
neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C2 6
neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6
neqo-latest vs. s2n-quic: H DC LR C20 M S R 3 B U E A L1 L2 C1 C2 6
neqo-latest vs. xquic: H DC LR C20 M R Z 3 B U L1 L2 C1 C2 6

neqo-latest as server

aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B A L1 L2 C1 C2 6 V2
go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6
kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
msquic vs. neqo-latest: H DC LR C20 M S R Z B A L1 L2 C1 C2 6 V2
mvfst vs. neqo-latest: H DC LR M 3 B L2 C2 6
neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
ngtcp2 vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
quic-go vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
quiche vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 C1 C2 6
quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L2 C1 C2 6
s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6
xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: E
neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2
neqo-latest vs. haproxy: E
neqo-latest vs. kwik: E
neqo-latest vs. msquic: 3 E
neqo-latest vs. mvfst: C20 S E V2
neqo-latest vs. nginx: E V2
neqo-latest vs. quic-go: E V2
neqo-latest vs. quiche: E V2
neqo-latest vs. quinn: V2
neqo-latest vs. s2n-quic: Z V2
neqo-latest vs. xquic: S E V2

neqo-latest as server

aioquic vs. neqo-latest: U E
chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2
go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
kwik vs. neqo-latest: E
msquic vs. neqo-latest: 3 E
mvfst vs. neqo-latest: C20 S R U E V2
quic-go vs. neqo-latest: E V2
quiche vs. neqo-latest: C20 U E V2
s2n-quic vs. neqo-latest: C20 Z U V2
xquic vs. neqo-latest: E V2

Sep 28 '24 08:09 github-actions[bot]

Benchmark results

Performance differences relative to 55e3a9363c28632dfb29ce91c7712cab1f6a58da.

coalesce_acked_from_zero 1+1 entries: :green_heart: Performance has improved.

       time:   [98.803 ns 99.130 ns 99.472 ns]
       change: [-12.775% -12.368% -11.953%] (p = 0.00 Found 15 outliers among 100 measurements (15.00%)
11 (11.00%) high mild
4 (4.00%) high severe

coalesce_acked_from_zero 3+1 entries: :green_heart: Performance has improved.

       time:   [116.87 ns 117.20 ns 117.56 ns]
       change: [-33.461% -33.093% -32.696%] (p = 0.00 Found 19 outliers among 100 measurements (19.00%)
2 (2.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
12 (12.00%) high severe

coalesce_acked_from_zero 10+1 entries: :green_heart: Performance has improved.

       time:   [116.24 ns 116.64 ns 117.13 ns]
       change: [-39.896% -35.657% -33.117%] (p = 0.00 Found 13 outliers among 100 measurements (13.00%)
4 (4.00%) low severe
3 (3.00%) high mild
6 (6.00%) high severe

coalesce_acked_from_zero 1000+1 entries: :green_heart: Performance has improved.

       time:   [97.407 ns 97.529 ns 97.668 ns]
       change: [-31.875% -31.315% -30.596%] (p = 0.00 Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.

       time:   [111.62 ms 111.67 ms 111.72 ms]
       change: [+0.1861% +0.2531% +0.3192%] (p = 0.00 Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) low mild
1 (1.00%) high mild

transfer/pacing-false/varying-seeds: No change in performance detected.

       time:   [26.891 ms 27.987 ms 29.092 ms]
       change: [-7.3239% -2.0801% +3.3604%] (p = 0.45 > 0.05)

transfer/pacing-true/varying-seeds: No change in performance detected.

       time:   [36.651 ms 38.242 ms 39.826 ms]
       change: [-5.9905% -0.1635% +5.8478%] (p = 0.95 > 0.05)

transfer/pacing-false/same-seed: No change in performance detected.

       time:   [26.723 ms 27.507 ms 28.285 ms]
       change: [-3.3295% +0.5804% +4.8142%] (p = 0.78 > 0.05)

transfer/pacing-true/same-seed: No change in performance detected.

       time:   [41.198 ms 43.275 ms 45.393 ms]
       change: [-7.2694% -1.0401% +5.9147%] (p = 0.75 > 0.05)
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

1-conn/1-100mb-resp (aka. Download)/client: :green_heart: Performance has improved.

       time:   [106.99 ms 107.48 ms 108.11 ms]
       thrpt:  [924.95 MiB/s 930.44 MiB/s 934.71 MiB/s]
change:
       time:   [-7.7356% -7.2870% -6.6739%] (p = 0.00 +7.8598% +8.3842%]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high severe

1-conn/10_000-parallel-1b-resp (aka. RPS)/client: No change in performance detected.

       time:   [319.80 ms 323.00 ms 326.21 ms]
       thrpt:  [30.655 Kelem/s 30.960 Kelem/s 31.270 Kelem/s]
change:
       time:   [-0.7331% +0.8246% +2.4125%] (p = 0.31 > 0.05)
       thrpt:  [-2.3557% -0.8178% +0.7385%]

1-conn/1-1b-resp (aka. HPS)/client: :broken_heart: Performance has regressed.

       time:   [36.454 ms 36.648 ms 36.855 ms]
       thrpt:  [27.134  elem/s 27.286  elem/s 27.432  elem/s]
change:
       time:   [+7.7777% +8.6189% +9.3890%] (p = 0.00 -7.9350% -7.2165%]
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client	Server	CC	Pacing	Mean [ms]	Min [ms]	Max [ms]	Relative
msquic	msquic			174.2 ± 91.5	100.2	414.4	1.00
neqo	msquic	reno	on	211.1 ± 10.9	192.1	224.9	1.00
neqo	msquic	reno		220.7 ± 20.5	200.7	265.3	1.00
neqo	msquic	cubic	on	208.0 ± 14.7	191.8	234.0	1.00
neqo	msquic	cubic		226.8 ± 44.1	192.9	363.9	1.00
msquic	neqo	reno	on	126.1 ± 73.5	83.7	328.4	1.00
msquic	neqo	reno		129.7 ± 91.4	84.0	455.9	1.00
msquic	neqo	cubic	on	131.8 ± 81.8	82.6	336.3	1.00
msquic	neqo	cubic		103.9 ± 48.7	81.9	321.9	1.00
neqo	neqo	reno	on	125.1 ± 11.7	107.7	146.4	1.00
neqo	neqo	reno		183.7 ± 137.4	106.9	687.8	1.00
neqo	neqo	cubic	on	185.2 ± 91.9	101.0	360.2	1.00
neqo	neqo	cubic		127.9 ± 20.0	103.5	172.4	1.00

:arrow_down: Download logs

Sep 28 '24 17:09 github-actions[bot]

recvmmsg is still worth doing, at least on MacOS with https://github.com/quinn-rs/quinn/pull/1993. That said, likely easiest to start from scratch.

Work tracked in https://github.com/mozilla/neqo/issues/1693. Closing here.

Nov 16 '24 14:11 mxinden