feat: Use `FxHasher` in places where we don't need DDoS resistance
I think this may be worthwhile. The cargo benches don't consistently show a benefit, but the loopback transfers on the bencher machine are faster, e.g.,
neqo neqo cubic on 1504 495.9 ± 96.2 426.6 712.7
without this PR but
neqo neqo cubic on 1504 429.1 ± 9.6 415.4 442.6
with it.
(I'll see if I can improve CI so that we also see the differences to main for the table results.)
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 95.56%. Comparing base (
1a42c54) to head (4ef0501). Report is 8 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #2342 +/- ##
==========================================
- Coverage 95.57% 95.56% -0.01%
==========================================
Files 115 115
Lines 37448 37448
Branches 37448 37448
==========================================
- Hits 35790 35788 -2
Misses 1654 1654
- Partials 4 6 +2
| Components | Coverage Δ | |
|---|---|---|
| neqo-common | 97.56% <ø> (ø) |
|
| neqo-crypto | 90.50% <ø> (ø) |
|
| neqo-http3 | 94.49% <100.00%> (ø) |
|
| neqo-qpack | 96.29% <100.00%> (ø) |
|
| neqo-transport | 96.54% <ø> (-0.02%) |
:arrow_down: |
| neqo-udp | 90.65% <ø> (ø) |
Failed Interop Tests
QUIC Interop Runner, client vs. server, differences relative to 789b70cd96fbf9536909bc0ca0b87e62e30ac726.
neqo-latest as client
- neqo-latest vs. aioquic: H :rocket:~~DC~~ M :rocket:~~S R~~ :warning:3 L2 :rocket:~~C1 BA~~ :warning:BP
- neqo-latest vs. go-x-net: :rocket:~~DC 6~~ :warning:U A C2 BP BA
- neqo-latest vs. haproxy: :warning:H DC LR C20 M S :warning:R Z 3 :warning:B U A :warning:L1 L2 C1 C2 6 V2 BP BA
- neqo-latest vs. kwik: :warning:H Z 3 B U L1 C1 6 :rocket:~~V2~~ BP BA
- neqo-latest vs. linuxquic: :rocket:~~DC M~~ :warning:H R :rocket:~~A~~ :warning:E L1 L2 C1 :rocket:~~BP CM~~ :warning:C2 6 V2
- neqo-latest vs. lsquic: :rocket:~~R~~ :warning:H DC A L1 C1 C2 :warning:6 V2
- neqo-latest vs. msquic: :rocket:~~H DC LR~~ M R Z B A L1 :warning:L2 C1 :rocket:~~6 BP~~ :warning:C2
- neqo-latest vs. mvfst: :rocket:~~LR~~ M :warning:R Z 3 B A L1 :rocket:~~C2~~
- neqo-latest vs. neqo: :rocket:~~R~~ :warning:S Z :warning:3 U A :rocket:~~C1 V2 BP BA~~
- neqo-latest vs. neqo-latest: :rocket:~~DC~~ LR :warning:Z 3 :warning:U E A :rocket:~~L2 C1 BP~~ :warning:6 V2 BA
- neqo-latest vs. nginx: :rocket:~~DC C20 S Z 3~~ :warning:M A L1 :rocket:~~L2 C2~~ :warning:C1 BP BA
- neqo-latest vs. ngtcp2: H DC :rocket:~~C20 U A~~ :warning:LR S R Z E L1 6 V2 :rocket:~~BA~~ CM
- neqo-latest vs. picoquic: :rocket:~~H M S R Z 3~~ B A L1 :warning:L2 C1 V2 :rocket:~~BP~~ :warning:BA
- neqo-latest vs. quic-go: DC :rocket:~~R Z~~ A L1 :rocket:~~6~~ :warning:C1 BP
- neqo-latest vs. quiche: :rocket:~~LR~~ :warning:H M R Z :rocket:~~C1~~ :warning:B A C2 6 BP BA
- neqo-latest vs. quinn: :rocket:~~M R B~~ :warning:H E A :warning:BP BA
- neqo-latest vs. s2n-quic: :rocket:~~S~~ :warning:LR C20 M 3 :rocket:~~E~~ :warning:B A :rocket:~~C1~~ :warning:6 BP BA CM
- neqo-latest vs. tquic: S :rocket:~~B 6~~ :warning:R Z 3 A L1 BP BA
- neqo-latest vs. xquic: :rocket:~~LR Z~~ :warning:M A L1 :warning:L2 C1 :rocket:~~C2 BA~~ :warning:6
neqo-latest as server
- aioquic vs. neqo-latest: :warning:C2 6 V2 CM
- kwik vs. neqo-latest: :rocket:~~LR~~ :warning:H DC S 3 L2 C2 :warning:V2 BP BA CM
- linuxquic vs. neqo-latest: run cancelled after 20 min
- lsquic vs. neqo-latest: :rocket:~~C2 6~~ :warning:H DC L1
- msquic vs. neqo-latest: :rocket:~~H~~ C20 :rocket:~~S L2~~ :warning:Z L1 C2 V2 CM
- mvfst vs. neqo-latest: :rocket:~~H LR~~ Z :warning:3 B A L1 :warning:L2 C1 C2 :warning:6 CM
- neqo vs. neqo-latest: :rocket:~~M R E~~ :warning:LR A :rocket:~~C2~~ :warning:BP BA CM
- ngtcp2 vs. neqo-latest: :rocket:~~H DC~~ :warning:LR C20 S :warning:R Z 3 B
- openssl vs. neqo-latest: :warning:H DC LR :rocket:~~C20~~ M :rocket:~~S R 3~~ B A :warning:6 CM
- picoquic vs. neqo-latest: run cancelled after 20 min
- quic-go vs. neqo-latest: :rocket:~~A~~ :warning:DC B C2 6 CM
- quiche vs. neqo-latest: :rocket:~~H DC~~ :warning:S 3 :warning:A L1 :rocket:~~L2~~ 6 BP :rocket:~~BA~~ CM
- quinn vs. neqo-latest: :rocket:~~B L1 L2~~ :warning:Z U A C2 6 V2 CM
- s2n-quic vs. neqo-latest: :rocket:~~LR A C1~~ :warning:M S R E BA CM
- tquic vs. neqo-latest: run cancelled after 20 min
- xquic vs. neqo-latest: run cancelled after 20 min
All results
Succeeded Interop Tests
QUIC Interop Runner, client vs. server
neqo-latest as client
- neqo-latest vs. aioquic: :rocket:~~DC~~ LR C20 :rocket:~~S R~~ Z :warning:3 B U A L1 :rocket:~~C1~~ C2 6 V2 :warning:BP :rocket:~~BA~~
- neqo-latest vs. go-x-net: H :rocket:~~DC~~ LR M B :warning:U A L2 :warning:C2 :rocket:~~6~~
- neqo-latest vs. kwik: :warning:H DC LR C20 M S R :warning:Z 3 B U A L2 C2 :rocket:~~V2~~
- neqo-latest vs. linuxquic: :warning:H :rocket:~~DC~~ LR C20 :rocket:~~M~~ S Z 3 B U :warning:E C2 6 V2 :rocket:~~A BP~~ BA :rocket:~~CM~~
- neqo-latest vs. lsquic: :warning:H DC LR C20 M S :rocket:~~R~~ Z 3 B U E :warning:A L2 :warning:6 BP BA CM
- neqo-latest vs. msquic: :rocket:~~H DC LR~~ C20 S U :warning:L2 C2 :rocket:~~6~~ V2 :rocket:~~BP~~ BA
- neqo-latest vs. mvfst: H DC :warning:R Z 3 B :rocket:~~LR~~ U L2 C1 :rocket:~~C2~~ 6 BP BA
- neqo-latest vs. neqo: H DC LR C20 M :warning:S 3 :rocket:~~R~~ B E L1 L2 :rocket:~~C1~~ C2 6 :rocket:~~V2 BP BA~~ CM
- neqo-latest vs. neqo-latest: H :rocket:~~DC~~ C20 M S R :warning:Z B :warning:U E L1 :rocket:~~L2 C1~~ C2 :warning:6 V2 BA :rocket:~~BP~~ CM
- neqo-latest vs. nginx: H :rocket:~~DC~~ LR :warning:M :rocket:~~C20 S~~ R :rocket:~~Z 3~~ B U :warning:A C1 :rocket:~~L2 C2~~ 6
- neqo-latest vs. ngtcp2: :warning:LR :rocket:~~C20~~ M :warning:S R Z 3 B :warning:E :rocket:~~U A~~ L2 C1 C2 BP :rocket:~~BA~~
- neqo-latest vs. picoquic: :rocket:~~H~~ DC LR C20 :rocket:~~M S R Z 3~~ U E :warning:L2 C2 6 :warning:BA :rocket:~~BP~~
- neqo-latest vs. quic-go: H LR C20 M S :rocket:~~R Z~~ 3 B U L2 :warning:C1 C2 :warning:BP :rocket:~~6~~ BA
- neqo-latest vs. quiche: :warning:H DC :rocket:~~LR~~ C20 :warning:M S :warning:R 3 :warning:B U :warning:A L1 L2 :warning:C2 6 :rocket:~~C1~~
- neqo-latest vs. quinn: :warning:H DC LR C20 :rocket:~~M~~ S :rocket:~~R~~ Z 3 :rocket:~~B~~ U :warning:E L1 L2 C1 C2 6 :warning:BP
- neqo-latest vs. s2n-quic: H DC :warning:LR C20 M :rocket:~~S~~ R :warning:B U :rocket:~~E~~ L1 L2 :rocket:~~C1~~ C2 :warning:6
- neqo-latest vs. tquic: H DC LR C20 M :warning:R Z 3 :rocket:~~B~~ U :warning:A L1 L2 C1 C2 :rocket:~~6~~
- neqo-latest vs. xquic: H DC :rocket:~~LR~~ C20 :warning:M R :rocket:~~Z~~ 3 B U :warning:L2 6 :rocket:~~C2~~ BP :rocket:~~BA~~
neqo-latest as server
- aioquic vs. neqo-latest: :rocket:~~H DC LR C20 M S R Z 3 B A L1 L2 C1 BP BA~~
- chrome vs. neqo-latest: 3
- go-x-net vs. neqo-latest: :rocket:~~H DC~~ LR B U A L2 :rocket:~~C2 6~~ BP :warning:BA
- kwik vs. neqo-latest: :warning:H DC :rocket:~~LR~~ C20 M :warning:S R Z :warning:3 B U A L1 :warning:L2 C1 6 :warning:V2
- lsquic vs. neqo-latest: :warning:H DC LR M S R 3 B :rocket:~~E~~ A :warning:L1 L2 C1 :rocket:~~C2 6~~ V2 :rocket:~~BP~~ BA CM
- msquic vs. neqo-latest: :rocket:~~H~~ DC LR :warning:Z :rocket:~~M S R~~ B :warning:U A L1 :rocket:~~L2~~ C1 :warning:C2 6 BP
- mvfst vs. neqo-latest: :rocket:~~H~~ DC :rocket:~~LR~~ M :warning:3 B L2 6 BA :rocket:~~BP~~
- neqo vs. neqo-latest: H DC :warning:LR C20 :rocket:~~M~~ S :rocket:~~R~~ Z 3 B U :rocket:~~E~~ L1 L2 C1 :rocket:~~C2~~ 6 V2 :warning:BP BA
- ngtcp2 vs. neqo-latest: :warning:LR C20 :rocket:~~H DC~~ M :warning:R Z 3 B :rocket:~~U E~~ A L1 L2 C1 C2 6 V2 :rocket:~~BP~~ BA :rocket:~~CM~~
- openssl vs. neqo-latest: :warning:H DC :rocket:~~C20 S R 3~~ L2 C2 :warning:6 BP BA
- quic-go vs. neqo-latest: H :warning:DC LR C20 M S R Z 3 :warning:B U :rocket:~~A~~ L1 L2 C1 :warning:C2 6 BP BA
- quiche vs. neqo-latest: :rocket:~~H DC~~ LR M :warning:S R Z B :warning:A :rocket:~~L2~~ C1 C2 :rocket:~~BA~~
- quinn vs. neqo-latest: H DC LR C20 M S R :warning:Z 3 :warning:U :rocket:~~B~~ E :warning:A :rocket:~~L1 L2~~ C1 :warning:C2 BP BA
- s2n-quic vs. neqo-latest: H DC :warning:M S R :rocket:~~LR~~ 3 B :warning:E :rocket:~~A~~ L1 L2 :rocket:~~C1~~ C2 6 BP :warning:BA
Unsupported Interop Tests
QUIC Interop Runner, client vs. server
neqo-latest as client
- neqo-latest vs. aioquic: E CM
- neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2 CM
- neqo-latest vs. haproxy: E CM
- neqo-latest vs. kwik: E CM
- neqo-latest vs. msquic: 3 E CM
- neqo-latest vs. mvfst: C20 S E V2 CM
- neqo-latest vs. nginx: E V2 CM
- neqo-latest vs. picoquic: CM
- neqo-latest vs. quic-go: E V2 CM
- neqo-latest vs. quiche: E V2 CM
- neqo-latest vs. quinn: V2 CM
- neqo-latest vs. s2n-quic: Z V2
- neqo-latest vs. tquic: E V2 CM
- neqo-latest vs. xquic: S E V2 CM
neqo-latest as server
- aioquic vs. neqo-latest: U E
- chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2 BP BA CM
- go-x-net vs. neqo-latest: C20 M S R Z 3 E L1 C1 V2 BA CM
- kwik vs. neqo-latest: R E
- lsquic vs. neqo-latest: C20 Z U E BP
- msquic vs. neqo-latest: M R 3 U E A BA
- mvfst vs. neqo-latest: C20 S R U E V2 BP BA
- openssl vs. neqo-latest: Z U E L1 C1 V2
- quic-go vs. neqo-latest: E V2
- quiche vs. neqo-latest: C20 U E V2
- s2n-quic vs. neqo-latest: C20 Z U V2
Benchmark results
Performance differences relative to 1c28b59c3382cbbe739ddfec3fee2b76d6ff0ded.
1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: Change within noise threshold.
time: [652.03 ms 652.96 ms 653.94 ms]
thrpt: [152.92 MiB/s 153.15 MiB/s 153.37 MiB/s]
change:
time: [+0.1719% +0.3725% +0.5806%] (p = 0.00 −0.3711% −0.1716%]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.
time: [295.78 ms 297.35 ms 298.90 ms]
thrpt: [33.456 Kelem/s 33.630 Kelem/s 33.809 Kelem/s]
change:
time: [−0.3514% +0.4843% +1.2622%] (p = 0.23 > 0.05)
thrpt: [−1.2465% −0.4820% +0.3526%]
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) low mild
1 (1.00%) high mild
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.
time: [27.105 ms 27.203 ms 27.319 ms]
thrpt: [36.605 elem/s 36.761 elem/s 36.893 elem/s]
change:
time: [−0.4290% +0.1715% +0.7553%] (p = 0.59 > 0.05)
thrpt: [−0.7496% −0.1712% +0.4309%]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high severe
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: Change within noise threshold.
time: [669.54 ms 670.80 ms 672.04 ms]
thrpt: [148.80 MiB/s 149.08 MiB/s 149.36 MiB/s]
change:
time: [+0.3592% +0.6086% +0.8549%] (p = 0.00 −0.6049% −0.3579%]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
decode 4096 bytes, mask ff: No change in performance detected.
time: [11.803 µs 11.889 µs 12.020 µs]
change: [−1.4507% −0.3900% +0.7543%] (p = 0.52 > 0.05)
Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) low severe
3 (3.00%) low mild
1 (1.00%) high mild
8 (8.00%) high severe
decode 1048576 bytes, mask ff: No change in performance detected.
time: [3.0197 ms 3.0289 ms 3.0398 ms]
change: [−0.5513% −0.0749% +0.4052%] (p = 0.74 > 0.05)
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
1 (1.00%) high mild
8 (8.00%) high severe
decode 4096 bytes, mask 7f: No change in performance detected.
time: [19.988 µs 20.050 µs 20.113 µs]
change: [−0.1982% +0.1718% +0.5730%] (p = 0.40 > 0.05)
Found 22 outliers among 100 measurements (22.00%)
1 (1.00%) low severe
3 (3.00%) low mild
1 (1.00%) high mild
17 (17.00%) high severe
decode 1048576 bytes, mask 7f: No change in performance detected.
time: [5.0495 ms 5.0611 ms 5.0743 ms]
change: [−0.2941% +0.0699% +0.4411%] (p = 0.71 > 0.05)
Found 14 outliers among 100 measurements (14.00%)
14 (14.00%) high severe
decode 4096 bytes, mask 3f: Change within noise threshold.
time: [8.2766 µs 8.3503 µs 8.4680 µs]
change: [+0.1831% +0.8687% +1.8006%] (p = 0.02 Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) low mild
2 (2.00%) high mild
11 (11.00%) high severedecode 1048576 bytes, mask 3f: No change in performance detected.
time: [1.5869 ms 1.5937 ms 1.6008 ms]
change: [−0.6152% −0.0011% +0.6203%] (p = 0.99 > 0.05)
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) high mild
7 (7.00%) high severe
1000 streams of 1 bytes/multistream: No change in performance detected.
time: [34.076 ns 34.388 ns 34.707 ns]
change: [−35.376% −15.831% −0.3222%] (p = 0.22 > 0.05)
Found 7 outliers among 500 measurements (1.40%)
7 (1.40%) high mild
1000 streams of 1000 bytes/multistream: :green_heart: Performance has improved.
time: [33.148 ns 33.416 ns 33.685 ns]
change: [−6.1930% −4.9223% −3.6724%] (p = 0.00 Found 5 outliers among 500 measurements (1.00%)
5 (1.00%) high mildcoalesce_acked_from_zero 1+1 entries: No change in performance detected.
time: [88.723 ns 89.078 ns 89.422 ns]
change: [−0.5142% −0.0657% +0.3886%] (p = 0.77 > 0.05)
Found 10 outliers among 100 measurements (10.00%)
9 (9.00%) high mild
1 (1.00%) high severe
coalesce_acked_from_zero 3+1 entries: No change in performance detected.
time: [106.97 ns 107.36 ns 107.77 ns]
change: [−0.1424% +0.2730% +0.6548%] (p = 0.20 > 0.05)
Found 15 outliers among 100 measurements (15.00%)
15 (15.00%) high severe
coalesce_acked_from_zero 10+1 entries: No change in performance detected.
time: [105.98 ns 106.31 ns 106.71 ns]
change: [−1.4608% −0.6281% −0.0074%] (p = 0.09 > 0.05)
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low severe
3 (3.00%) low mild
3 (3.00%) high severe
coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
time: [89.526 ns 89.779 ns 90.079 ns]
change: [−0.5313% +0.1981% +1.0177%] (p = 0.64 > 0.05)
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
RxStreamOrderer::inbound_frame(): Change within noise threshold.
time: [110.14 ms 110.28 ms 110.51 ms]
change: [−0.9040% −0.6570% −0.4252%] (p = 0.00 Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) low mild
3 (3.00%) high mild
1 (1.00%) high severeSentPackets::take_ranges: No change in performance detected.
time: [7.6675 µs 7.9624 µs 8.2094 µs]
change: [−5.1129% +6.5359% +21.594%] (p = 0.39 > 0.05)
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
transfer/pacing-false/varying-seeds: Change within noise threshold.
time: [33.863 ms 33.930 ms 33.999 ms]
change: [−0.8679% −0.6112% −0.3238%] (p = 0.00 Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mildtransfer/pacing-true/varying-seeds: Change within noise threshold.
time: [35.162 ms 35.257 ms 35.353 ms]
change: [+0.1158% +0.5287% +0.9337%] (p = 0.01 transfer/pacing-false/same-seed: Change within noise threshold.
time: [33.938 ms 33.986 ms 34.036 ms]
change: [+0.1687% +0.3617% +0.5469%] (p = 0.00 Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severetransfer/pacing-true/same-seed: Change within noise threshold.
time: [35.737 ms 35.804 ms 35.880 ms]
change: [+0.7647% +1.0220% +1.2943%] (p = 0.00 Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severeClient/server transfer results
Performance differences relative to 1c28b59c3382cbbe739ddfec3fee2b76d6ff0ded.
Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
| Client vs. server (params) | Mean ± σ | Min | Max | MiB/s ± σ | Δ main |
Δ main |
|---|---|---|---|---|---|---|
| google vs. google | 519.7 ± 35.4 | 492.5 | 710.0 | 61.6 ± 0.9 | ||
| google vs. neqo (cubic, paced) | 384.7 ± 38.4 | 360.3 | 657.4 | 83.2 ± 0.8 | -1.4 | -0.4% |
| msquic vs. msquic | 156.2 ± 30.6 | 121.7 | 311.5 | 204.8 ± 1.0 | ||
| msquic vs. neqo (cubic, paced) | 292.3 ± 48.7 | 266.4 | 562.5 | 109.5 ± 0.7 | -2.8 | -0.9% |
| neqo vs. google (cubic, paced) | 806.8 ± 27.2 | 738.4 | 1046.4 | 39.7 ± 1.2 | -3.0 | -0.4% |
| neqo vs. msquic (cubic, paced) | 218.9 ± 57.7 | 195.4 | 693.7 | 146.2 ± 0.6 | 4.8 | 2.2% |
| neqo vs. neqo (cubic) | 260.1 ± 43.1 | 225.7 | 549.6 | 123.0 ± 0.7 | :broken_heart: 9.5 | 3.8% |
| neqo vs. neqo (cubic, paced) | 264.3 ± 42.2 | 238.8 | 532.2 | 121.1 ± 0.8 | 7.8 | 3.0% |
| neqo vs. neqo (reno) | 254.3 ± 34.6 | 220.1 | 468.4 | 125.8 ± 0.9 | -5.9 | -2.3% |
| neqo vs. neqo (reno, paced) | 257.9 ± 36.9 | 235.4 | 523.7 | 124.1 ± 0.9 | 2.4 | 0.9% |
| neqo vs. quiche (cubic, paced) | 246.9 ± 35.5 | 230.9 | 491.1 | 129.6 ± 0.9 | -6.9 | -2.7% |
| neqo vs. s2n (cubic, paced) | 262.4 ± 30.5 | 246.0 | 458.7 | 121.9 ± 1.0 | -2.2 | -0.8% |
| quiche vs. neqo (cubic, paced) | 418.0 ± 32.6 | 383.6 | 604.6 | 76.5 ± 1.0 | -8.7 | -2.0% |
| quiche vs. quiche | 193.5 ± 32.6 | 177.0 | 406.2 | 165.3 ± 1.0 | ||
| s2n vs. neqo (cubic, paced) | 304.2 ± 15.7 | 290.7 | 374.5 | 105.2 ± 2.0 | -6.8 | -2.2% |
| s2n vs. s2n | 264.3 ± 54.6 | 237.0 | 588.2 | 121.1 ± 0.6 |
Download data for profiler.firefox.com or download performance comparison data.
@martinthomson thanks for the analysis. My plan is to add some benches first in another PR. I'll add some for those instances where you suggest to look into EnumMap as well.
Even if some of the macro benefits come from speeding up the demo client and server code, it's IMO still worth doing, since eliminating those overheads makes it easier to spot other bottlenecks.
About security, I didn't do much of an analysis, but I think the main use of this insecure hasher would be when looking up items (streams, unacked chunks) that while under the control of an attacker are also quite limited in what valid values are that wouldn't immediately cause a connection clause.
I definitely agree with the point about removing the overheads from our toy code as much as possible. This seems like a pretty substantial win there, so it's worth doing. I doubt that my EnumMap suggestions will have a major impact, but the change did highlight the possibility (and it's not that much typing to switch over).
I think the EnumMap work should be factored out another PR, it will cause a bunch of changes throughout.
I've checked the usage in our server code, which is fine because attackers don't get to control memory allocations (we use pointer values for the hash). Still, that makes me wonder whether we should be using
Pin.
Good point. Though before we introduce the complexity of Pin, we might find a simple way around hashing the pointer values in the first place.
Though before we introduce the complexity of Pin, we might find a simple way around hashing the pointer values in the first place.
Definitely the right question to be asking. I think that it might be possible to use the first connection ID as a key for this sort of thing, but we don't tend to keep that around today, once we stop using it. Everything else -- as far as I know -- is ephemeral and therefore not suitable.
I'm doing a benchmark in #2444 to quantify the benefits first. (It's not going well, a lot of variation run-to-run for some reason.)
a lot of variation run-to-run for some reason
That is counterintuitive for me, given that it uses test-fixtures and thus does no IO via the OS. Let me know if you want me to look into it.
I wonder if it's the CPU scheduler and frequency control on my Mac. Bencher seems much more stable.
For what it is worth, here is #2444 on my machine:
➜ neqo-http3 git:(test-streams-bench) ✗ cargo bench --features bench
Benchmarking 1 streams of 1 bytes/multistream: Collecting 100 samples in estimated 5.1966 s (2400
1 streams of 1 bytes/multistream
time: [31.399 µs 31.555 µs 31.731 µs]
change: [-12.172% -10.263% -8.3468%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
Benchmarking 1000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 6.2030 s (40
1000 streams of 1 bytes/multistream
time: [13.088 ms 13.117 ms 13.151 ms]
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
Benchmarking 10000 streams of 1 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 87.1s, or reduce sample count to 10.
Benchmarking 10000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 87.111 s (1
10000 streams of 1 bytes/multistream
time: [876.43 ms 882.16 ms 888.29 ms]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
Benchmarking 1 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.0982 s (22
1 streams of 1000 bytes/multistream
time: [33.435 µs 33.884 µs 34.409 µs]
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
Benchmarking 100 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.1397 s (
100 streams of 1000 bytes/multistream
time: [1.5683 ms 1.5823 ms 1.5968 ms]
Benchmarking 1000 streams of 1000 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.2s, or reduce sample count to 60.
Benchmarking 1000 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 7.2433 s
1000 streams of 1000 bytes/multistream
time: [66.837 ms 67.100 ms 67.391 ms]
Found 20 outliers among 100 measurements (20.00%)
5 (5.00%) high mild
15 (15.00%) high severe
➜ neqo-http3 git:(test-streams-bench) ✗ cat /proc/cpuinfo
model name : AMD Ryzen 7 7840U w/ Radeon 780M Graphics
I don't see much deviation. Am I running the wrong version @larseggert?
Can you run it again and see if there are changes run to run? That is where I see random improvements or regressions.
Here are two more runs with vanilla https://github.com/mozilla/neqo/pull/2444. No significant deviations. Note that I am not running your optimizations in this pull request.
➜ neqo-http3 git:(test-streams-bench) ✗ cargo bench --features bench
Benchmarking 1 streams of 1 bytes/multistream: Collecting 100 samples in estimated 5.0811 s
1 streams of 1 bytes/multistream
time: [31.514 µs 31.727 µs 32.013 µs]
change: [-0.3795% +0.5461% +1.6585%] (p = 0.28 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
Benchmarking 1000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 6.2106
1000 streams of 1 bytes/multistream
time: [13.032 ms 13.066 ms 13.104 ms]
change: [-0.7614% -0.3884% -0.0397%] (p = 0.04 < 0.05)
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low mild
1 (1.00%) high mild
9 (9.00%) high severe
Benchmarking 10000 streams of 1 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 87.3s, or reduce sample count to 10.
Benchmarking 10000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 87.26
10000 streams of 1 bytes/multistream
time: [850.20 ms 852.13 ms 853.94 ms]
change: [-4.1112% -3.4050% -2.7470%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
6 (6.00%) low severe
5 (5.00%) low mild
3 (3.00%) high mild
Benchmarking 1 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.1258
1 streams of 1000 bytes/multistream
time: [32.380 µs 32.615 µs 32.914 µs]
change: [-5.3650% -3.7472% -2.1786%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severe
Benchmarking 100 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.16
100 streams of 1000 bytes/multistream
time: [1.4970 ms 1.5041 ms 1.5121 ms]
change: [-5.9242% -4.9438% -3.9764%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) high mild
4 (4.00%) high severe
Benchmarking 1000 streams of 1000 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.9s, or reduce sample count to 70.
Benchmarking 1000 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 6.9
1000 streams of 1000 bytes/multistream
time: [66.039 ms 66.255 ms 66.489 ms]
change: [-1.7872% -1.2586% -0.7446%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 21 outliers among 100 measurements (21.00%)
18 (18.00%) high mild
3 (3.00%) high severe
➜ neqo-http3 git:(test-streams-bench) ✗ cargo bench --features bench
Benchmarking 1 streams of 1 bytes/multistream: Collecting 100 samples in estimated 5.0222 s
1 streams of 1 bytes/multistream
time: [31.196 µs 31.566 µs 32.008 µs]
change: [-1.9923% -0.5099% +1.0965%] (p = 0.52 > 0.05)
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
Benchmarking 1000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 6.2479
1000 streams of 1 bytes/multistream
time: [12.863 ms 12.919 ms 12.980 ms]
change: [-1.6309% -1.1270% -0.5695%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 19 outliers among 100 measurements (19.00%)
10 (10.00%) high mild
9 (9.00%) high severe
Benchmarking 10000 streams of 1 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 87.5s, or reduce sample count to 10.
Benchmarking 10000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 87.46
10000 streams of 1 bytes/multistream
time: [862.91 ms 864.67 ms 866.53 ms]
change: [+1.1571% +1.4717% +1.7784%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
Benchmarking 1 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.1478
1 streams of 1000 bytes/multistream
time: [32.551 µs 32.892 µs 33.283 µs]
change: [-0.5530% +0.8511% +2.2636%] (p = 0.24 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
Benchmarking 100 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.22
100 streams of 1000 bytes/multistream
time: [1.5114 ms 1.5174 ms 1.5245 ms]
change: [+0.2271% +0.8880% +1.5818%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
Benchmarking 1000 streams of 1000 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.0s, or reduce sample count to 70.
Benchmarking 1000 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 7.0
1000 streams of 1000 bytes/multistream
time: [66.765 ms 66.997 ms 67.247 ms]
change: [+0.6277% +1.1200% +1.6225%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) high mild
1 (1.00%) high severe
Oh good. I think it is core pinning being awkward on macOS then.
BTW, I came across https://manuel.bernhardt.io/posts/2023-11-16-core-pinning/ today, and we should change the bencher accordingly.
I'm redoing this PR in stages, to check if the new bench actually shows any improvements. The first push changes only the existing (non-test) uses of HashMap and HashSet to FxHasher.
Hm. The benches all show small improvements, while the client/server tests all show small regressions...
This shows enough benefit for me. @mxinden?
I think FxHasher is strictly faster than the built-in one.
I don't feel strongly about it, i.e. I am fine merging here. Just missing the version change suggested above.
Bencher Report
| Branch | feat-fxhasher |
| Testbed | t-linux64-ms-280 |
Click to view all benchmark results
| Benchmark | Latency | Benchmark Result nanoseconds (ns) (Result Δ%) | Upper Boundary nanoseconds (ns) (Limit %) |
|---|---|---|---|
| 1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client | 📈 view plot 🚷 view threshold | 670,800,000.00 ns(+2.12%)Baseline: 656,858,518.52 ns | 681,621,307.92 ns (98.41%) |
| 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client | 📈 view plot 🚷 view threshold | 652,960,000.00 ns(+1.67%)Baseline: 642,227,777.78 ns | 673,152,042.44 ns (97.00%) |
| 1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client | 📈 view plot 🚷 view threshold | 27,203,000.00 ns(+0.02%)Baseline: 27,198,222.22 ns | 27,431,157.10 ns (99.17%) |
| 1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client | 📈 view plot 🚷 view threshold | 297,350,000.00 ns(-3.74%)Baseline: 308,906,296.30 ns | 315,303,598.66 ns (94.31%) |
| 1000 streams of 1 bytes/multistream | 📈 view plot 🚷 view threshold | 34.39 ns(-18.51%)Baseline: 42.20 ns | 61.47 ns (55.95%) |
| 1000 streams of 1000 bytes/multistream | 📈 view plot 🚷 view threshold | 33.42 ns(-19.30%)Baseline: 41.41 ns | 61.42 ns (54.41%) |
| RxStreamOrderer::inbound_frame() | 📈 view plot 🚷 view threshold | 110,280,000.00 ns(-0.36%)Baseline: 110,683,333.33 ns | 111,315,010.32 ns (99.07%) |
| SentPackets::take_ranges | 📈 view plot 🚷 view threshold | 7,962.40 ns(+0.53%)Baseline: 7,920.81 ns | 8,016.91 ns (99.32%) |
| coalesce_acked_from_zero 1+1 entries | 📈 view plot 🚷 view threshold | 89.08 ns(+0.47%)Baseline: 88.66 ns | 89.22 ns (99.84%) |
| coalesce_acked_from_zero 10+1 entries | 📈 view plot 🚷 view threshold | 106.31 ns(+0.32%)Baseline: 105.97 ns | 107.04 ns (99.32%) |
| coalesce_acked_from_zero 1000+1 entries | 📈 view plot 🚷 view threshold | 89.78 ns(+0.49%)Baseline: 89.34 ns | 91.67 ns (97.94%) |
| coalesce_acked_from_zero 3+1 entries | 📈 view plot 🚷 view threshold | 107.36 ns(+0.78%)Baseline: 106.53 ns | 107.64 ns (99.74%) |
| decode 1048576 bytes, mask 3f | 📈 view plot 🚷 view threshold | 1,593,700.00 ns(-3.94%)Baseline: 1,659,103.70 ns | 1,876,499.31 ns (84.93%) |
| decode 1048576 bytes, mask 7f | 📈 view plot 🚷 view threshold | 5,061,100.00 ns(-0.19%)Baseline: 5,070,792.59 ns | 5,110,870.39 ns (99.03%) |
| decode 1048576 bytes, mask ff | 📈 view plot 🚷 view threshold | 3,028,900.00 ns(-0.53%)Baseline: 3,044,948.15 ns | 3,091,434.89 ns (97.98%) |
| decode 4096 bytes, mask 3f | 📈 view plot 🚷 view threshold | 8,350.30 ns(+14.72%)Baseline: 7,278.86 ns | 10,577.40 ns (78.94%) |
| decode 4096 bytes, mask 7f | 📈 view plot 🚷 view threshold | 20,050.00 ns(+1.40%)Baseline: 19,774.07 ns | 20,475.80 ns (97.92%) |
| decode 4096 bytes, mask ff | 📈 view plot 🚷 view threshold | 11,889.00 ns(+0.97%)Baseline: 11,774.70 ns | 12,003.24 ns (99.05%) |
| transfer/pacing-false/same-seed | 📈 view plot 🚷 view threshold | 33,986,000.00 ns(-1.07%)Baseline: 34,354,481.48 ns | 35,446,879.00 ns (95.88%) |
| transfer/pacing-false/varying-seeds | 📈 view plot 🚷 view threshold | 33,930,000.00 ns(-1.62%)Baseline: 34,490,111.11 ns | 35,574,608.59 ns (95.38%) |
| transfer/pacing-true/same-seed | 📈 view plot 🚷 view threshold | 35,804,000.00 ns(-0.72%)Baseline: 36,064,037.04 ns | 37,202,680.14 ns (96.24%) |
| transfer/pacing-true/varying-seeds | 📈 view plot 🚷 view threshold | 35,257,000.00 ns(-0.55%)Baseline: 35,452,333.33 ns | 36,574,014.89 ns (96.40%) |
Bencher Report
| Branch | feat-fxhasher |
| Testbed | t-linux64-ms-280 |
Click to view all benchmark results
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| s2n vs. neqo (cubic, paced) | 📈 view plot 🚷 view threshold | 304.16 ms(-1.71%)Baseline: 309.45 ms | 326.45 ms (93.17%) |