neqo icon indicating copy to clipboard operation
neqo copied to clipboard

perf(qlog): introduce outer `Option`

Open mxinden opened this issue 1 month ago • 6 comments

Currently qlog is defined as:

pub struct Qlog {
    inner: Rc<RefCell<Option<SharedStreamer>>>,
}

https://github.com/mozilla/neqo/issues/1894 documents our finding, that dereferencing the Rc<RefCell<_>> is costly, i.e. showing up in our CPU profiles.

When writing to the qlog fails, we set the (inner) Option to None. Thus on each consecutive logging atttempt we dereference the Rc<RefCell<_>> only to find out that logging is disabled.

This patch still sets the inner Option to None to inform all other clones of the Rc that logging is disabled. In addition, it introduces an outer Option, which as well is set to None, thus avoiding the dereferencing of the Rc<RefCell<_>> on each logging attempt once logging has failed.

pub struct Qlog {
    inner: Option<Rc<RefCell<Option<SharedStreamer>>>>,
}

Fixes https://github.com/mozilla/neqo/issues/1894. Alternative for https://github.com/mozilla/neqo/pull/3005.

@martinthomson's idea, shared at IETF 124. Let's see what our benchmarks say. Not ready for actual review.

mxinden avatar Nov 12 '25 14:11 mxinden

CodSpeed Performance Report

Merging #3129 will degrade performances by 4.88%

Comparing mxinden:qlog-option (032cd27) with main (b9c32c7)

Summary

⚡ 1 improvement
❌ 2 regressions
✅ 20 untouched

:warning: Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Mode Benchmark BASE HEAD Change
Simulation client 852.3 ms 769 ms +10.83%
Simulation decode 4096 bytes, mask ff 17.8 µs 18.4 µs -3.3%
Simulation wallclock-time 32 ms 33.6 ms -4.88%

codspeed-hq[bot] avatar Nov 12 '25 14:11 codspeed-hq[bot]

Do we need both Options? Otherwise, I prefer this to #3005!

larseggert avatar Nov 14 '25 06:11 larseggert

I don't see a way around the two Options. The inner Option will notify all other owners of the qlog, that the log has errored, the outer Option prevents the current owner from checking the shared reference again.

mxinden avatar Nov 14 '25 09:11 mxinden

@mxinden given that you are busy, would you like me to apply @martinthomson's suggestion and merge this? It'd unblock #2989.

larseggert avatar Nov 28 '25 06:11 larseggert

Codecov Report

:x: Patch coverage is 85.18519% with 8 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 93.47%. Comparing base (a94d62f) to head (2b8ca27). :warning: Report is 37 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3129      +/-   ##
==========================================
- Coverage   93.50%   93.47%   -0.03%     
==========================================
  Files         125      125              
  Lines       36659    36683      +24     
  Branches    36659    36683      +24     
==========================================
+ Hits        34277    34291      +14     
- Misses       1537     1546       +9     
- Partials      845      846       +1     
Components Coverage Δ
neqo-common 97.15% <58.82%> (-0.21%) :arrow_down:
neqo-crypto 83.17% <ø> (-0.49%) :arrow_down:
neqo-http3 93.30% <100.00%> (+<0.01%) :arrow_up:
neqo-qpack 94.40% <100.00%> (ø)
neqo-transport 94.64% <96.96%> (+0.02%) :arrow_up:
neqo-udp 79.42% <ø> (ø)
mtu 85.44% <ø> (ø)

codecov[bot] avatar Dec 02 '25 15:12 codecov[bot]

Yes please @larseggert. Thanks.

mxinden avatar Dec 04 '25 13:12 mxinden

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to 18306622d6f446d19736f336e8e707656b63e7fe.

neqo-latest as clientneqo-latest as server
neqo-latest vs. aioquic: A :rocket:L1
neqo-latest vs. go-x-net: A BP BA
neqo-latest vs. haproxy: A BP BA
neqo-latest vs. kwik: BP BA
neqo-latest vs. linuxquic: A :warning:L1 C1
neqo-latest vs. lsquic: :warning:L1 C1
neqo-latest vs. msquic: :warning:R Z A L1 C1
neqo-latest vs. mvfst: A L1 :rocket:C1 BA
neqo-latest vs. nginx: A L1 C1 BP BA
neqo-latest vs. ngtcp2: A CM
neqo-latest vs. picoquic: :rocket:Z A L1 C1
neqo-latest vs. quic-go: A :rocket:L1 C1
neqo-latest vs. quiche: A :warning:L1 C1 BP BA
neqo-latest vs. quinn: A :rocket:C1
neqo-latest vs. s2n-quic: A :rocket:BP :warning:L1 C1 BA CM
neqo-latest vs. tquic: S A BP BA
neqo-latest vs. xquic: A :rocket:L1
aioquic vs. neqo-latest: :rocket:L1 BA CM
chrome vs. neqo-latest: 3
go-x-net vs. neqo-latest: CM
kwik vs. neqo-latest: BP BA CM
msquic vs. neqo-latest: CM
mvfst vs. neqo-latest: Z A L1 C1 CM
openssl vs. neqo-latest: LR M A CM
quic-go vs. neqo-latest: CM
quiche vs. neqo-latest: :rocket:L1 C1 CM
quinn vs. neqo-latest: V2 CM
s2n-quic vs. neqo-latest: CM
tquic vs. neqo-latest: :warning:BA CM
xquic vs. neqo-latest: M CM
All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

github-actions[bot] avatar Dec 08 '25 10:12 github-actions[bot]

Client/server transfer results

Performance differences relative to 8777585d02787dbf327a79b5f81212575ef5fef4.

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ main Δ main
msquic-neqo-cubic 197.6 ± 60.2 152.1 686.2 161.9 ± 0.5 :broken_heart: 13.1 7.1%
neqo-neqo-cubic 96.7 ± 4.3 87.7 105.4 330.9 ± 7.4 :green_heart: -1.3 -1.4%
neqo-neqo-reno-nopacing 99.2 ± 5.0 91.7 126.3 322.5 ± 6.4 :broken_heart: 3.3 3.5%

Table above only shows statistically significant changes. See all results below.

All results

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ main Δ main
google-google-nopacing 477.1 ± 3.4 470.8 487.2 67.1 ± 9.4
google-neqo-cubic 280.9 ± 4.1 272.7 289.4 113.9 ± 7.8 -0.1 -0.0%
msquic-msquic-nopacing 165.1 ± 39.7 136.7 463.0 193.8 ± 0.8
msquic-neqo-cubic 197.6 ± 60.2 152.1 686.2 161.9 ± 0.5 :broken_heart: 13.1 7.1%
neqo-google-cubic 779.6 ± 4.5 772.1 794.4 41.0 ± 7.1 -0.0 -0.0%
neqo-msquic-cubic 156.8 ± 4.6 148.6 166.7 204.0 ± 7.0 0.5 0.3%
neqo-neqo-cubic 96.7 ± 4.3 87.7 105.4 330.9 ± 7.4 :green_heart: -1.3 -1.4%
neqo-neqo-cubic-nopacing 97.2 ± 4.6 89.7 106.7 329.1 ± 7.0 1.2 1.2%
neqo-neqo-reno 98.0 ± 5.4 88.8 121.8 326.4 ± 5.9 -0.0 -0.0%
neqo-neqo-reno-nopacing 99.2 ± 5.0 91.7 126.3 322.5 ± 6.4 :broken_heart: 3.3 3.5%
neqo-quiche-cubic 193.2 ± 4.7 186.6 207.9 165.7 ± 6.8 -0.2 -0.1%
neqo-s2n-cubic 220.8 ± 4.0 213.5 230.0 144.9 ± 8.0 -1.1 -0.5%
quiche-neqo-cubic 152.5 ± 5.7 140.9 179.8 209.8 ± 5.6 -1.3 -0.8%
quiche-quiche-nopacing 142.6 ± 4.5 134.7 155.1 224.3 ± 7.1
s2n-neqo-cubic 173.9 ± 4.9 162.9 185.7 184.1 ± 6.5 0.6 0.4%
s2n-s2n-nopacing 243.1 ± 14.0 231.2 335.4 131.7 ± 2.3

Download data for profiler.firefox.com or download performance comparison data.

github-actions[bot] avatar Dec 08 '25 10:12 github-actions[bot]

Benchmark results

Significant performance differences relative to c6295a164f94b439aa0c8d98681732ec2befabae.

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: :green_heart: Performance has improved by -2.6407%.
       time:   [198.13 ms 198.51 ms 198.91 ms]
       thrpt:  [502.74 MiB/s 503.75 MiB/s 504.72 MiB/s]
change:
       time:   [-2.9618% -2.6407% -2.3569] (p = 0.00  +2.7124% +3.0522]
       Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: :green_heart: Performance has improved by -1.7518%.
       time:   [205.30 ms 205.62 ms 205.98 ms]
       thrpt:  [485.50 MiB/s 486.34 MiB/s 487.08 MiB/s]
change:
       time:   [-1.9484% -1.7518% -1.5254] (p = 0.00  +1.7830% +1.9871]
       Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
decode 4096 bytes, mask ff: :broken_heart: Performance has regressed by +64.805%.
       time:   [11.326 µs 11.364 µs 11.410 µs]
       change: [+64.136% +64.805% +65.631] (p = 0.00 
decode 1048576 bytes, mask ff: :broken_heart: Performance has regressed by +69.709%.
       time:   [2.9911 ms 3.0004 ms 3.0114 ms]
       change: [+69.115% +69.709% +70.417] (p = 0.00 
decode 4096 bytes, mask 7f: :broken_heart: Performance has regressed by +105.67%.
       time:   [19.551 µs 19.598 µs 19.655 µs]
       change: [+104.86% +105.67% +106.49] (p = 0.00 
decode 1048576 bytes, mask 7f: :broken_heart: Performance has regressed by +107.81%.
       time:   [5.0316 ms 5.0435 ms 5.0568 ms]
       change: [+107.25% +107.81% +108.45] (p = 0.00 
decode 4096 bytes, mask 3f: :green_heart: Performance has improved by -44.155%.
       time:   [6.2072 µs 6.2377 µs 6.2746 µs]
       change: [-44.450% -44.155% -43.800] (p = 0.00 
decode 1048576 bytes, mask 3f: :green_heart: Performance has improved by -38.179%.
       time:   [1.7577 ms 1.7609 ms 1.7670 ms]
       change: [-38.371% -38.179% -37.898] (p = 0.00 
transfer/pacing-false/varying-seeds/wallclock-time/run: :broken_heart: Performance has regressed by +3.8304%.
       time:   [24.715 ms 24.759 ms 24.808 ms]
       change: [+3.6202% +3.8304% +4.0379] (p = 0.00 
transfer/pacing-false/varying-seeds/simulated-time/run: :broken_heart: Performance has regressed by +5.2789%.
       time:   [25.171 s 25.204 s 25.238 s]
       thrpt:  [162.30 KiB/s 162.51 KiB/s 162.73 KiB/s]
change:
       time:   [+5.1245% +5.2789% +5.4230] (p = 0.00  -5.0142% -4.8747]
       Performance has regressed.
transfer/pacing-true/varying-seeds/wallclock-time/run: :broken_heart: Performance has regressed by +3.3139%.
       time:   [24.773 ms 24.821 ms 24.875 ms]
       change: [+3.0168% +3.3139% +3.6069] (p = 0.00 
transfer/pacing-true/varying-seeds/simulated-time/run: :broken_heart: Performance has regressed by +5.1632%.
       time:   [24.862 s 24.898 s 24.934 s]
       thrpt:  [164.27 KiB/s 164.51 KiB/s 164.75 KiB/s]
change:
       time:   [+5.0161% +5.1632% +5.3267] (p = 0.00  -4.9097% -4.7765]
       Performance has regressed.
transfer/pacing-false/same-seed/simulated-time/run: :broken_heart: Performance has regressed by +5.4046%.
       time:   [25.234 s 25.234 s 25.234 s]
       thrpt:  [162.32 KiB/s 162.32 KiB/s 162.32 KiB/s]
change:
       time:   [+5.4046% +5.4046% +5.4046] (p = 0.00  -5.1275% -5.1275]
       Performance has regressed.
transfer/pacing-true/same-seed/simulated-time/run: :broken_heart: Performance has regressed by +6.4681%.
       time:   [25.207 s 25.207 s 25.207 s]
       thrpt:  [162.50 KiB/s 162.50 KiB/s 162.50 KiB/s]
change:
       time:   [+6.4681% +6.4681% +6.4681] (p = 0.00  -6.0752% -6.0752]
       Performance has regressed.
All results
1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: :green_heart: Performance has improved by -2.6407%.
       time:   [198.13 ms 198.51 ms 198.91 ms]
       thrpt:  [502.74 MiB/s 503.75 MiB/s 504.72 MiB/s]
change:
       time:   [-2.9618% -2.6407% -2.3569] (p = 0.00  +2.7124% +3.0522]
       Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.
       time:   [279.85 ms 281.46 ms 283.09 ms]
       thrpt:  [35.324 Kelem/s 35.529 Kelem/s 35.734 Kelem/s]
change:
       time:   [-0.8108% +0.0425% +0.8851] (p = 0.93 > 0.05)
       thrpt:  [-0.8773% -0.0425% +0.8174]
       No change in performance detected.
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.
       time:   [38.582 ms 38.736 ms 38.908 ms]
       thrpt:  [25.702   B/s 25.816   B/s 25.919   B/s]
change:
       time:   [-0.4810% +0.1594% +0.8116] (p = 0.64 > 0.05)
       thrpt:  [-0.8050% -0.1591% +0.4833]
       No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) high mild
6 (6.00%) high severe
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: :green_heart: Performance has improved by -1.7518%.
       time:   [205.30 ms 205.62 ms 205.98 ms]
       thrpt:  [485.50 MiB/s 486.34 MiB/s 487.08 MiB/s]
change:
       time:   [-1.9484% -1.7518% -1.5254] (p = 0.00  +1.7830% +1.9871]
       Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
decode 4096 bytes, mask ff: :broken_heart: Performance has regressed by +64.805%.
       time:   [11.326 µs 11.364 µs 11.410 µs]
       change: [+64.136% +64.805% +65.631] (p = 0.00 
decode 1048576 bytes, mask ff: :broken_heart: Performance has regressed by +69.709%.
       time:   [2.9911 ms 3.0004 ms 3.0114 ms]
       change: [+69.115% +69.709% +70.417] (p = 0.00 
decode 4096 bytes, mask 7f: :broken_heart: Performance has regressed by +105.67%.
       time:   [19.551 µs 19.598 µs 19.655 µs]
       change: [+104.86% +105.67% +106.49] (p = 0.00 
decode 1048576 bytes, mask 7f: :broken_heart: Performance has regressed by +107.81%.
       time:   [5.0316 ms 5.0435 ms 5.0568 ms]
       change: [+107.25% +107.81% +108.45] (p = 0.00 
decode 4096 bytes, mask 3f: :green_heart: Performance has improved by -44.155%.
       time:   [6.2072 µs 6.2377 µs 6.2746 µs]
       change: [-44.450% -44.155% -43.800] (p = 0.00 
decode 1048576 bytes, mask 3f: :green_heart: Performance has improved by -38.179%.
       time:   [1.7577 ms 1.7609 ms 1.7670 ms]
       change: [-38.371% -38.179% -37.898] (p = 0.00 
1-streams/each-1000-bytes/wallclock-time: Change within noise threshold.
       time:   [581.10 µs 583.43 µs 586.63 µs]
       change: [-1.3483% -0.7641% -0.0743] (p = 0.01 
1-streams/each-1000-bytes/simulated-time: No change in performance detected.
       time:   [118.88 ms 119.10 ms 119.32 ms]
       thrpt:  [8.1847 KiB/s 8.1996 KiB/s 8.2147 KiB/s]
change:
       time:   [-0.2115% +0.0698% +0.3549] (p = 0.62 > 0.05)
       thrpt:  [-0.3537% -0.0698% +0.2119]
       No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild
1000-streams/each-1-bytes/wallclock-time: No change in performance detected.
       time:   [12.361 ms 12.401 ms 12.442 ms]
       change: [-0.7200% -0.2601% +0.2186] (p = 0.29 > 0.05)
       No change in performance detected.
1000-streams/each-1-bytes/simulated-time: No change in performance detected.
       time:   [2.3316 s 2.3352 s 2.3389 s]
       thrpt:  [427.55   B/s 428.22   B/s 428.89   B/s]
change:
       time:   [-0.0636% +0.1592% +0.3794] (p = 0.16 > 0.05)
       thrpt:  [-0.3779% -0.1589% +0.0636]
       No change in performance detected.
1000-streams/each-1000-bytes/wallclock-time: Change within noise threshold.
       time:   [49.623 ms 49.733 ms 49.845 ms]
       change: [+0.2022% +0.5497% +0.8751] (p = 0.00 
1000-streams/each-1000-bytes/simulated-time: No change in performance detected.
       time:   [16.116 s 16.382 s 16.645 s]
       thrpt:  [58.669 KiB/s 59.614 KiB/s 60.595 KiB/s]
change:
       time:   [-2.7425% -0.2994% +2.0777] (p = 0.80 > 0.05)
       thrpt:  [-2.0354% +0.3003% +2.8198]
       No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) low mild
coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [89.142 ns 89.457 ns 89.780 ns]
       change: [-0.5618% -0.0662% +0.4260] (p = 0.78 > 0.05)
       No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severe
coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [106.08 ns 106.41 ns 106.75 ns]
       change: [-0.9632% -0.2582% +0.3414] (p = 0.47 > 0.05)
       No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) low mild
3 (3.00%) high mild
11 (11.00%) high severe
coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [105.86 ns 106.35 ns 106.91 ns]
       change: [-0.6712% +0.0882% +0.8631] (p = 0.82 > 0.05)
       No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) low mild
1 (1.00%) high mild
10 (10.00%) high severe
coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [91.805 ns 91.949 ns 92.101 ns]
       change: [-0.3164% +0.3368% +0.9903] (p = 0.32 > 0.05)
       No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [110.31 ms 110.39 ms 110.47 ms]
       change: [+0.5394% +0.6409% +0.7384] (p = 0.00 
sent::Packets::take_ranges: No change in performance detected.
       time:   [4.4730 µs 4.5729 µs 4.6651 µs]
       change: [-4.7106% -1.4521% +2.0471] (p = 0.41 > 0.05)
       No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
transfer/pacing-false/varying-seeds/wallclock-time/run: :broken_heart: Performance has regressed by +3.8304%.
       time:   [24.715 ms 24.759 ms 24.808 ms]
       change: [+3.6202% +3.8304% +4.0379] (p = 0.00 
transfer/pacing-false/varying-seeds/simulated-time/run: :broken_heart: Performance has regressed by +5.2789%.
       time:   [25.171 s 25.204 s 25.238 s]
       thrpt:  [162.30 KiB/s 162.51 KiB/s 162.73 KiB/s]
change:
       time:   [+5.1245% +5.2789% +5.4230] (p = 0.00  -5.0142% -4.8747]
       Performance has regressed.
transfer/pacing-true/varying-seeds/wallclock-time/run: :broken_heart: Performance has regressed by +3.3139%.
       time:   [24.773 ms 24.821 ms 24.875 ms]
       change: [+3.0168% +3.3139% +3.6069] (p = 0.00 
transfer/pacing-true/varying-seeds/simulated-time/run: :broken_heart: Performance has regressed by +5.1632%.
       time:   [24.862 s 24.898 s 24.934 s]
       thrpt:  [164.27 KiB/s 164.51 KiB/s 164.75 KiB/s]
change:
       time:   [+5.0161% +5.1632% +5.3267] (p = 0.00  -4.9097% -4.7765]
       Performance has regressed.
transfer/pacing-false/same-seed/wallclock-time/run: Change within noise threshold.
       time:   [24.678 ms 24.700 ms 24.727 ms]
       change: [+2.0985% +2.2314% +2.3763] (p = 0.00 
transfer/pacing-false/same-seed/simulated-time/run: :broken_heart: Performance has regressed by +5.4046%.
       time:   [25.234 s 25.234 s 25.234 s]
       thrpt:  [162.32 KiB/s 162.32 KiB/s 162.32 KiB/s]
change:
       time:   [+5.4046% +5.4046% +5.4046] (p = 0.00  -5.1275% -5.1275]
       Performance has regressed.
transfer/pacing-true/same-seed/wallclock-time/run: Change within noise threshold.
       time:   [25.236 ms 25.257 ms 25.279 ms]
       change: [+2.7714% +2.9568% +3.1078] (p = 0.00 
transfer/pacing-true/same-seed/simulated-time/run: :broken_heart: Performance has regressed by +6.4681%.
       time:   [25.207 s 25.207 s 25.207 s]
       thrpt:  [162.50 KiB/s 162.50 KiB/s 162.50 KiB/s]
change:
       time:   [+6.4681% +6.4681% +6.4681] (p = 0.00  -6.0752% -6.0752]
       Performance has regressed.

Download data for profiler.firefox.com or download performance comparison data.

github-actions[bot] avatar Dec 08 '25 16:12 github-actions[bot]