neqo icon indicating copy to clipboard operation
neqo copied to clipboard

refactor(bin): use 32k stream IO buffer

Open mxinden opened this issue 1 year ago • 8 comments

Firefox by default uses a 32k IO buffer for streams.

https://searchfox.org/mozilla-central/rev/f6e3b81aac49e602f06c204f9278da30993cdc8a/modules/libpref/init/all.js#3212

This commit makes neqo-bin use the same buffer size across http09/3 and client/server.

Along the way it consolidates various buffer logic and reuses buffers whereever feasible.

mxinden avatar Jul 26 '24 14:07 mxinden

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

github-actions[bot] avatar Jul 26 '24 15:07 github-actions[bot]

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 95.36%. Comparing base (f801c29) to head (bad3673).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2008      +/-   ##
==========================================
- Coverage   95.36%   95.36%   -0.01%     
==========================================
  Files         112      112              
  Lines       36475    36475              
==========================================
- Hits        34784    34783       -1     
- Misses       1691     1692       +1     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jul 26 '24 15:07 codecov[bot]

Benchmark results

Performance differences relative to f801c295df38b3aebe0b481ed5702218663b4308.

coalesce_acked_from_zero 1+1 entries: :broken_heart: Performance has regressed.
       time:   [196.61 ns 197.05 ns 197.53 ns]
       change: [+1.4739% +1.8071% +2.1494%] (p = 0.00 Found 16 outliers among 100 measurements (16.00%)
1 (1.00%) low mild
6 (6.00%) high mild
9 (9.00%) high severe
coalesce_acked_from_zero 3+1 entries: Change within noise threshold.
       time:   [237.21 ns 237.92 ns 238.71 ns]
       change: [+0.0346% +0.3689% +0.7299%] (p = 0.04 Found 16 outliers among 100 measurements (16.00%)
16 (16.00%) high severe
coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [236.98 ns 238.05 ns 239.36 ns]
       change: [-0.1969% +0.3499% +0.9639%] (p = 0.23 > 0.05)

Found 9 outliers among 100 measurements (9.00%) 9 (9.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [215.66 ns 215.88 ns 216.12 ns]
       change: [-0.0640% +0.7259% +1.4315%] (p = 0.06 > 0.05)

Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [119.60 ms 119.68 ms 119.76 ms]
       change: [+0.8078% +0.9063% +1.0011%] (p = 0.00 Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
transfer/pacing-false/varying-seeds: No change in performance detected.
       time:   [40.423 ms 42.503 ms 44.609 ms]
       change: [-5.6161% +0.7236% +7.8640%] (p = 0.83 > 0.05)

Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 2 (2.00%) high mild

transfer/pacing-true/varying-seeds: Change within noise threshold.
       time:   [57.763 ms 60.818 ms 63.932 ms]
       change: [+2.2889% +10.117% +18.409%] (p = 0.01 Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
transfer/pacing-false/same-seed: No change in performance detected.
       time:   [47.965 ms 49.768 ms 51.492 ms]
       change: [-6.1545% -1.5034% +3.4935%] (p = 0.55 > 0.05)

Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild

transfer/pacing-true/same-seed: No change in performance detected.
       time:   [60.923 ms 67.805 ms 74.674 ms]
       change: [-18.847% -6.3929% +7.1145%] (p = 0.35 > 0.05)
1-conn/1-100mb-resp (aka. Download)/client: :green_heart: Performance has improved.
       time:   [126.55 ms 127.76 ms 129.52 ms]
       thrpt:  [772.10 MiB/s 782.71 MiB/s 790.18 MiB/s]
change:
       time:   [-27.403% -26.268% -24.915%] (p = 0.00 +35.626% +37.747%]

Found 8 outliers among 100 measurements (8.00%) 7 (7.00%) low mild 1 (1.00%) high severe

1-conn/10_000-parallel-1b-resp (aka. RPS)/client: No change in performance detected.
       time:   [408.49 ms 411.58 ms 414.65 ms]
       thrpt:  [24.117 Kelem/s 24.296 Kelem/s 24.481 Kelem/s]
change:
       time:   [-0.9293% +0.2571% +1.4515%] (p = 0.66 > 0.05)
       thrpt:  [-1.4308% -0.2564% +0.9380%]

Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild

1-conn/1-1b-resp (aka. HPS)/client: No change in performance detected.
       time:   [46.648 ms 47.456 ms 48.297 ms]
       thrpt:  [20.705  elem/s 21.072  elem/s 21.437  elem/s]
change:
       time:   [-1.6077% +0.7764% +3.0788%] (p = 0.52 > 0.05)
       thrpt:  [-2.9869% -0.7704% +1.6340%]

Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 198.2 ± 84.1 98.9 322.2 1.00
neqo msquic reno on 298.3 ± 71.2 259.3 466.2 1.00
neqo msquic reno 279.3 ± 25.0 247.5 332.7 1.00
neqo msquic cubic on 315.0 ± 106.3 255.0 597.3 1.00
neqo msquic cubic 281.3 ± 42.2 250.4 398.3 1.00
msquic neqo reno on 167.5 ± 108.4 85.6 530.2 1.00
msquic neqo reno 157.2 ± 91.3 84.4 359.1 1.00
msquic neqo cubic on 135.0 ± 96.7 84.5 458.3 1.00
msquic neqo cubic 167.8 ± 96.0 85.4 346.6 1.00
neqo neqo reno on 192.0 ± 66.5 151.9 363.2 1.00
neqo neqo reno 190.8 ± 68.6 151.6 407.3 1.00
neqo neqo cubic on 258.4 ± 135.3 159.7 596.3 1.00
neqo neqo cubic 311.5 ± 175.9 173.7 664.5 1.00

:arrow_down: Download logs

github-actions[bot] avatar Jul 26 '24 15:07 github-actions[bot]

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

  • Linux: Debug Release
  • macOS: Debug Release
  • Windows: Debug Release

github-actions[bot] avatar Jul 26 '24 17:07 github-actions[bot]

Strange. "No change in performance detected", but a 97% increase in runtime. Will look into it now.

1-conn/1-100mb-resp (aka. Download)/client: No change in performance detected.

       time:   [358.86 ms 589.33 ms 967.64 ms]
       thrpt:  [103.34 MiB/s 169.68 MiB/s 278.66 MiB/s]
change:
       time:   [+21.333% +97.949% +256.03%] (p = 0.08 > 0.05)
       thrpt:  [-71.912% -49.482% -17.582%]
       No change
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

mxinden avatar Jul 29 '24 09:07 mxinden

Still debugging the performance regression. I expected a performance improvement.

Interesting difference in --stats of a ~1GB download with rx: 819060 on main and rx: 25662 on 32k.

branch main

9s627ms INFO stats for Client ...
  rx: 819060 drop 1 dup 0 saved 1
  tx: 119448 lost 14789 lateack 4 ptoack 28
  pmtud: 8 sent 6 acked 0 lost 0 change
  resumed: false
  frames rx:
    crypto 6 done 5 token 4 close 0
    ack 2449 (max 119418) ping 4 padding 2666
    stream 817514 reset 0 stop 0
    max: stream 0 data 0 stream_data 0
    blocked: stream 0 data 0 stream_data 1299
    datagram 0
    ncid 35 rcid 0 pchallenge 0 presponse 0
    ack_frequency 10
  frames tx:
    crypto 3 done 0 token 0 close 3
    ack 117360 (max 820038) ping 46 padding 8
    stream 8 reset 0 stop 0
    max: stream 0 data 0 stream_data 2521
    blocked: stream 0 data 0 stream_data 0
    datagram 0
    ncid 0 rcid 0 pchallenge 0 presponse 0
    ack_frequency 2

branch 32k

3s488ms INFO stats for Client ...
  rx: 25662 drop 1 dup 0 saved 1
  tx: 15781 lost 0 lateack 0 ptoack 187
  pmtud: 10 sent 10 acked 0 lost 0 change
  resumed: false
  frames rx:
    crypto 3 done 1 token 1 close 0
    ack 2811 (max 15765) ping 78 padding 375850
    stream 24043 reset 0 stop 0
    max: stream 0 data 0 stream_data 0
    blocked: stream 0 data 0 stream_data 2670
    datagram 0
    ncid 7 rcid 0 pchallenge 0 presponse 0
    ack_frequency 3
  frames tx:
    crypto 2 done 0 token 0 close 3
    ack 13998 (max 26763) ping 149 padding 10
    stream 5 reset 0 stop 0
    max: stream 0 data 0 stream_data 2764
    blocked: stream 0 data 0 stream_data 0
    datagram 0
    ncid 0 rcid 0 pchallenge 0 presponse 0
    ack_frequency 1

mxinden avatar Aug 04 '24 09:08 mxinden

Without pacing, this pull request outperforms main:

➜  neqo-bin git:(main) ✗ critcmp main main-no-pacing 32k 32k-no-pacing -f "Download" --list
1-conn/1-100mb-resp (aka. Download)/client
------------------------------------------
32k-no-pacing      1.00    114.8±49.78ms   871.3 MB/sec
main-no-pacing     1.05    121.0±83.97ms   826.4 MB/sec
main               1.45    165.9±56.69ms   602.8 MB/sec
32k               11.39  1307.8±1038.97ms   76.5 MB/sec

mxinden avatar Aug 04 '24 09:08 mxinden

This is ready for a review. https://github.com/mozilla/neqo/pull/2035 resolved the performance regressions. In addition, as expected, this now improves the Download benchmark:

1-conn/1-100mb-resp (aka. Download)/client: 💚 Performance has improved.

   time:   [126.55 ms 127.76 ms 129.52 ms]
   thrpt:  [772.10 MiB/s 782.71 MiB/s 790.18 MiB/s]

change: time: [-27.403% -26.268% -24.915%] (p = 0.00 < 0.05) thrpt: [+33.182% +35.626% +37.747%]

Note: The benefit of this pull request is not the performance improvement above, but the fact that the benchmark now executes closer to how Firefox uses neqo-*.

mxinden avatar Aug 11 '24 12:08 mxinden