neqo fix: Add some crate features for performance

Let's see if they do.

Also, @mxinden, I was wondering why we went with a multi-threaded tokio client and server. I'm wondering if the thread-management overheads are worth it compared to using just the rt scheduler?

Mar 06 '25 10:03 larseggert

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 94.91%. Comparing base (8c65240) to head (f01b7a3). :warning: Report is 10 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2477      +/-   ##
==========================================
+ Coverage   94.88%   94.91%   +0.03%     
==========================================
  Files         115      115              
  Lines       34313    34313              
  Branches    34313    34313              
==========================================
+ Hits        32558    32569      +11     
+ Misses       1748     1737      -11     
  Partials        7        7

Components	Coverage Δ
neqo-common	`97.61% <ø> (ø)`
neqo-crypto	`89.64% <ø> (ø)`
neqo-http3	`93.71% <ø> (ø)`
neqo-qpack	`95.45% <ø> (ø)`
neqo-transport	`95.95% <ø> (+0.05%)`	:arrow_up:
neqo-udp	`89.85% <ø> (+0.48%)`	:arrow_up:

Mar 06 '25 10:03 codecov[bot]

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to 8c6524015e07834a72d566804041544d746e93ef.

neqo-latest as client

neqo-latest vs. go-x-net: BP BA
neqo-latest vs. haproxy: :warning:C1 BP BA
neqo-latest vs. kwik: L1 C1 BP BA
neqo-latest vs. linuxquic: L1 C1
neqo-latest vs. lsquic: E L1 C1
neqo-latest vs. msquic: :warning:R Z A L1 C1
neqo-latest vs. mvfst: A :warning:L1 C1
neqo-latest vs. neqo: A
neqo-latest vs. neqo-latest: A
neqo-latest vs. nginx: BP BA
neqo-latest vs. ngtcp2: E CM
neqo-latest vs. picoquic: Z E A :rocket:~~L1~~ :warning:C1
neqo-latest vs. quic-go: A
neqo-latest vs. quiche: BP BA
neqo-latest vs. s2n-quic: E :warning:BP BA CM
neqo-latest vs. tquic: S :warning:A BP BA
neqo-latest vs. xquic: :warning:A L1 C1

neqo-latest as server

aioquic vs. neqo-latest: :rocket:~~BA~~ CM
go-x-net vs. neqo-latest: CM
kwik vs. neqo-latest: BP BA CM
msquic vs. neqo-latest: :rocket:~~BA~~ :warning:U CM
mvfst vs. neqo-latest: Z A L1 C1 CM
neqo vs. neqo-latest: A
openssl vs. neqo-latest: LR M A CM
quic-go vs. neqo-latest: CM
quiche vs. neqo-latest: :rocket:~~L1~~ CM
quinn vs. neqo-latest: V2 CM
s2n-quic vs. neqo-latest: CM
tquic vs. neqo-latest: CM
xquic vs. neqo-latest: M CM

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2 BP BA
neqo-latest vs. go-x-net: H DC LR M B U A L2 C2 6
neqo-latest vs. haproxy: H DC LR C20 M S R Z 3 B U A L1 L2 :warning:C1 C2 6 V2
neqo-latest vs. kwik: H DC LR C20 M S R Z 3 B U A L2 C2 6 V2
neqo-latest vs. linuxquic: H DC LR C20 M S R Z 3 B U E A L2 C2 6 V2 BP BA CM
neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U A L2 C2 6 V2 BP BA CM
neqo-latest vs. msquic: H DC LR C20 M S :warning:R Z B U L2 C2 6 V2 BP BA
neqo-latest vs. mvfst: H DC LR M R Z 3 B U :warning:L1 L2 C2 6 BP BA
neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E L1 L2 C1 C2 6 V2 BP BA CM
neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E L1 L2 C1 C2 6 V2 BP BA CM
neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2 BP BA
neqo-latest vs. picoquic: H DC LR C20 M S R 3 B U :rocket:~~L1~~ L2 :warning:C1 C2 6 V2 BP BA
neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U L1 L2 C1 C2 6 BP BA
neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U E :rocket:~~A~~ L1 L2 C1 C2 6 BP BA
neqo-latest vs. s2n-quic: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 :warning:BP
neqo-latest vs. tquic: H DC LR C20 M R Z 3 B U :warning:A L1 L2 C1 C2 6
neqo-latest vs. xquic: :rocket:~~H DC LR C20 M R Z 3 B U L2 C2 6 BP BA~~

neqo-latest as server

aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B A L1 L2 C1 C2 6 V2 BP :rocket:~~BA~~
chrome vs. neqo-latest: 3
go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6 BP BA
kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
linuxquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
lsquic vs. neqo-latest: H DC LR C20 M S R 3 B E A L1 L2 :rocket:~~C1~~ C2 6 V2 BP BA CM
msquic vs. neqo-latest: H DC LR C20 M S R Z B :warning:U A L1 L2 C1 C2 6 V2 BP :rocket:~~BA~~
mvfst vs. neqo-latest: H DC LR M 3 B L2 C2 6 BP BA
neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E L1 L2 C1 C2 6 V2 BP BA CM
ngtcp2 vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
openssl vs. neqo-latest: H DC C20 S R 3 B L2 C2 6 BP BA
picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
quic-go vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 BP BA
quiche vs. neqo-latest: H DC LR M S R Z 3 B A :rocket:~~L1~~ L2 C1 C2 6 BP BA
quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 BP BA
s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6 BP BA
tquic vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 C1 C2 6 BP BA
xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6 BP BA

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: E CM
neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2 CM
neqo-latest vs. haproxy: E CM
neqo-latest vs. kwik: E CM
neqo-latest vs. msquic: 3 E CM
neqo-latest vs. mvfst: C20 S E V2 CM
neqo-latest vs. nginx: E V2 CM
neqo-latest vs. picoquic: CM
neqo-latest vs. quic-go: E V2 CM
neqo-latest vs. quiche: E V2 CM
neqo-latest vs. quinn: V2 CM
neqo-latest vs. s2n-quic: Z V2
neqo-latest vs. tquic: E V2 CM
neqo-latest vs. xquic: S E V2 CM

neqo-latest as server

aioquic vs. neqo-latest: U E
chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2 BP BA CM
go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
kwik vs. neqo-latest: E
lsquic vs. neqo-latest: Z U
msquic vs. neqo-latest: 3 E
mvfst vs. neqo-latest: C20 S R U E V2
openssl vs. neqo-latest: Z U E L1 C1 V2
quic-go vs. neqo-latest: E V2
quiche vs. neqo-latest: C20 U E V2
s2n-quic vs. neqo-latest: C20 Z U V2
tquic vs. neqo-latest: C20 U E V2
xquic vs. neqo-latest: E V2

Mar 06 '25 10:03 github-actions[bot]

Benchmark results

Performance differences relative to a341259e7b317445bc9dee12172a160722819b9d.

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: :green_heart: Performance has improved.

       time:   [198.58 ms 198.94 ms 199.30 ms]
       thrpt:  [501.76 MiB/s 502.67 MiB/s 503.58 MiB/s]
change:
       time:   [−2.0523% −1.7659% −1.4888%] (p = 0.00 +1.7976% +2.0953%]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.

       time:   [302.98 ms 304.36 ms 305.74 ms]
       thrpt:  [32.708 Kelem/s 32.856 Kelem/s 33.006 Kelem/s]
change:
       time:   [−0.0873% +0.5531% +1.2132%] (p = 0.09 > 0.05)
       thrpt:  [−1.1987% −0.5500% +0.0874%]

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.

       time:   [27.349 ms 27.439 ms 27.558 ms]
       thrpt:  [36.287   B/s 36.444   B/s 36.564   B/s]
change:
       time:   [−0.7745% −0.2244% +0.3429%] (p = 0.45 > 0.05)
       thrpt:  [−0.3417% +0.2249% +0.7806%]
Found 22 outliers among 100 measurements (22.00%)
2 (2.00%) low severe
20 (20.00%) high severe

1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: :green_heart: Performance has improved.

       time:   [622.90 ms 627.59 ms 632.27 ms]
       thrpt:  [158.16 MiB/s 159.34 MiB/s 160.54 MiB/s]
change:
       time:   [−5.0735% −4.1248% −3.2188%] (p = 0.00 +4.3022% +5.3446%]
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) low severe
3 (3.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe

decode 4096 bytes, mask ff: Change within noise threshold.

       time:   [11.629 µs 11.672 µs 11.721 µs]
       change: [−1.4598% −1.0726% −0.5453%] (p = 0.00 Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) low severe
3 (3.00%) low mild
11 (11.00%) high severe

decode 1048576 bytes, mask ff: Change within noise threshold.

       time:   [3.0583 ms 3.0679 ms 3.0805 ms]
       change: [+0.7403% +1.2257% +1.7377%] (p = 0.00 Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
7 (7.00%) high severe

decode 4096 bytes, mask 7f: :green_heart: Performance has improved.

       time:   [19.363 µs 19.446 µs 19.562 µs]
       change: [−4.0416% −3.0524% −2.3825%] (p = 0.00 Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) low severe
1 (1.00%) low mild
12 (12.00%) high severe

decode 1048576 bytes, mask 7f: Change within noise threshold.

       time:   [5.0845 ms 5.0972 ms 5.1105 ms]
       change: [+0.4872% +0.8734% +1.2405%] (p = 0.00 Found 15 outliers among 100 measurements (15.00%)
15 (15.00%) high severe

decode 4096 bytes, mask 3f: :green_heart: Performance has improved.

       time:   [5.5305 µs 5.5588 µs 5.5930 µs]
       change: [−33.157% −32.863% −32.534%] (p = 0.00 Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe

decode 1048576 bytes, mask 3f: :broken_heart: Performance has regressed.

       time:   [1.7873 ms 1.7997 ms 1.8123 ms]
       change: [+12.244% +13.138% +13.972%] (p = 0.00

1000 streams of 1 bytes/multistream: :broken_heart: Performance has regressed.

       time:   [47.764 ns 47.945 ns 48.127 ns]
       change: [+29.708% +31.271% +32.821%] (p = 0.00 Found 1 outliers among 500 measurements (0.20%)
1 (0.20%) high mild

1000 streams of 1000 bytes/multistream: :broken_heart: Performance has regressed.

       time:   [47.002 ns 47.177 ns 47.353 ns]
       change: [+22.520% +23.830% +25.204%] (p = 0.00

coalesce_acked_from_zero 1+1 entries: No change in performance detected.

       time:   [88.065 ns 88.418 ns 88.774 ns]
       change: [−0.4540% −0.0593% +0.3303%] (p = 0.77 > 0.05)
Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) high mild
1 (1.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.

       time:   [105.56 ns 106.12 ns 106.83 ns]
       change: [−0.4586% +0.0028% +0.4900%] (p = 0.99 > 0.05)
Found 16 outliers among 100 measurements (16.00%)
1 (1.00%) high mild
15 (15.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.

       time:   [104.67 ns 105.03 ns 105.46 ns]
       change: [−0.3186% +0.1905% +0.9788%] (p = 0.63 > 0.05)
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) low severe
1 (1.00%) low mild
5 (5.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.

       time:   [88.603 ns 88.775 ns 88.973 ns]
       change: [−1.1413% −0.3373% +0.4524%] (p = 0.44 > 0.05)
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe

RxStreamOrderer::inbound_frame(): No change in performance detected.

       time:   [107.89 ms 107.96 ms 108.03 ms]
       change: [−0.2639% −0.0098% +0.1676%] (p = 0.94 > 0.05)
Found 11 outliers among 100 measurements (11.00%)
10 (10.00%) low mild
1 (1.00%) high severe

sent::Packets::take_ranges: No change in performance detected.

       time:   [8.0893 µs 8.3207 µs 8.5443 µs]
       change: [−3.3996% +3.7280% +13.696%] (p = 0.52 > 0.05)
Found 22 outliers among 100 measurements (22.00%)
4 (4.00%) low severe
11 (11.00%) low mild
3 (3.00%) high mild
4 (4.00%) high severe

transfer/pacing-false/varying-seeds: Change within noise threshold.

       time:   [37.216 ms 37.293 ms 37.371 ms]
       change: [+0.5064% +0.8545% +1.2038%] (p = 0.00

transfer/pacing-true/varying-seeds: Change within noise threshold.

       time:   [37.913 ms 38.031 ms 38.155 ms]
       change: [+0.7331% +1.1975% +1.6586%] (p = 0.00 Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe

transfer/pacing-false/same-seed: Change within noise threshold.

       time:   [36.575 ms 36.650 ms 36.735 ms]
       change: [−0.6683% −0.3588% −0.0384%] (p = 0.02 Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

transfer/pacing-true/same-seed: Change within noise threshold.

       time:   [38.728 ms 38.817 ms 38.911 ms]
       change: [+1.6254% +1.9650% +2.2815%] (p = 0.00 Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high severe

Client/server transfer results

Performance differences relative to a341259e7b317445bc9dee12172a160722819b9d.

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params)	Mean ± σ	Min	Max	MiB/s ± σ	Δ `main`	Δ `main`
google vs. google	457.0 ± 3.9	451.4	467.3	70.0 ± 8.2
google vs. neqo (cubic, paced)	272.9 ± 4.2	266.3	285.4	117.3 ± 7.6	-0.0	-0.0%
msquic vs. msquic	127.6 ± 15.5	109.5	192.1	250.8 ± 2.1
msquic vs. neqo (cubic, paced)	141.1 ± 15.1	118.2	199.7	226.8 ± 2.1	:green_heart: -5.8	-4.0%
neqo vs. google (cubic, paced)	762.0 ± 4.0	755.8	773.1	42.0 ± 8.0	-0.9	-0.1%
neqo vs. msquic (cubic, paced)	155.3 ± 4.4	148.7	163.5	206.0 ± 7.3	:green_heart: -1.4	-0.9%
neqo vs. neqo (cubic)	89.0 ± 4.5	82.4	103.8	359.5 ± 7.1	:green_heart: -2.5	-2.8%
neqo vs. neqo (cubic, paced)	91.5 ± 5.0	83.9	109.1	349.6 ± 6.4	-0.1	-0.1%
neqo vs. neqo (reno)	89.1 ± 4.8	80.5	106.8	359.1 ± 6.7	:green_heart: -2.0	-2.2%
neqo vs. neqo (reno, paced)	90.6 ± 4.2	83.8	101.1	353.2 ± 7.6	:green_heart: -1.6	-1.7%
neqo vs. quiche (cubic, paced)	193.5 ± 4.5	185.6	204.4	165.4 ± 7.1	:broken_heart: 1.8	0.9%
neqo vs. s2n (cubic, paced)	218.9 ± 4.1	211.4	226.5	146.2 ± 7.8	:broken_heart: 1.3	0.6%
quiche vs. neqo (cubic, paced)	160.6 ± 5.1	150.8	171.4	199.2 ± 6.3	:broken_heart: 2.7	1.7%
quiche vs. quiche	147.6 ± 4.9	139.8	160.4	216.8 ± 6.5
s2n vs. neqo (cubic, paced)	170.8 ± 4.6	162.0	181.4	187.3 ± 7.0	-0.5	-0.3%
s2n vs. s2n	249.8 ± 27.6	231.2	352.9	128.1 ± 1.2

Download data for profiler.firefox.com or download performance comparison data.

Mar 06 '25 11:03 github-actions[bot]

Also, @mxinden, I was wondering why we went with a multi-threaded tokio client and server.

I chose multi-threaded as it is the de-facto default. No other reason.

I'm wondering if the thread-management overheads are worth it compared to using just the rt scheduler?

:+1: worth experimenting. Intuitively, given that it is a single future only, there is no cross-thread communication and thus no significant overhead.

Mar 06 '25 14:03 mxinden

I am fine merging here. That said, I would prefer individual pull requests per feature, to ensure each change, and not just all changes as a whole, have a positive performance impact. In addition, I don't think we should merge here before we have reliable benchmarks, i.e. not merge here before https://github.com/mozilla/neqo/issues/2657 is fixed.

May 23 '25 19:05 mxinden

Bencher Report