neqo icon indicating copy to clipboard operation
neqo copied to clipboard

fix(client): remove needless process_output in process_input

Open mxinden opened this issue 1 year ago • 12 comments

There is no need to call process_output within process_multiple_input after each GRO datagram batch. Instead, process all available incoming datagrams in process_multiple_input and then move on to the top of the loop, calling handler.handle and then process_output` as usual.

mxinden avatar Jun 13 '24 07:06 mxinden

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

github-actions[bot] avatar Jun 13 '24 07:06 github-actions[bot]

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 95.34%. Comparing base (f5125c6) to head (a6803e5).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1922      +/-   ##
==========================================
- Coverage   95.37%   95.34%   -0.04%     
==========================================
  Files         112      112              
  Lines       36570    36570              
==========================================
- Hits        34880    34868      -12     
- Misses       1690     1702      +12     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jun 13 '24 07:06 codecov[bot]

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

  • Linux: Debug Release
  • macOS: Debug Release
  • Windows: Debug Release

github-actions[bot] avatar Jun 13 '24 09:06 github-actions[bot]

Let's wait for a benchmark run before merging here. Just to make sure nothing depends on the additional calls to process_output.

mxinden avatar Jun 18 '24 09:06 mxinden

@mxinden the RPS bench seems to just sit there idle.

larseggert avatar Jun 18 '24 10:06 larseggert

@mxinden anything I can do to help land this?

larseggert avatar Jul 01 '24 08:07 larseggert

I suggest we land https://github.com/mozilla/neqo/pull/1929 first. Might resolve the RPS benchmark stall.

mxinden avatar Jul 01 '24 10:07 mxinden

Pulled in from main w/#1929. Let's see.

larseggert avatar Jul 02 '24 12:07 larseggert

Benchmark results

Performance differences relative to f5125c6d522322a0730b69a7e860e1339746b161.

coalesce_acked_from_zero 1+1 entries: Change within noise threshold.
       time:   [90.366 ns 90.660 ns 90.967 ns]
       change: [+0.2168% +0.5769% +0.9580%] (p = 0.00 Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) high mild
6 (6.00%) high severe
coalesce_acked_from_zero 3+1 entries: :green_heart: Performance has improved.
       time:   [100.81 ns 101.09 ns 101.40 ns]
       change: [-2.0460% -1.5509% -1.0930%] (p = 0.00 Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) low mild
2 (2.00%) high mild
12 (12.00%) high severe
coalesce_acked_from_zero 10+1 entries: :green_heart: Performance has improved.
       time:   [100.56 ns 101.08 ns 101.67 ns]
       change: [-2.3644% -1.7060% -1.0685%] (p = 0.00 Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) low mild
1 (1.00%) high mild
9 (9.00%) high severe
coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [79.168 ns 79.289 ns 79.433 ns]
       change: [-1.9308% -0.5169% +0.9605%] (p = 0.51 > 0.05)

Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [111.82 ms 111.87 ms 111.92 ms]
       change: [-0.4649% -0.2528% -0.1101%] (p = 0.00 Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) low mild
1 (1.00%) high mild
transfer/pacing-false/varying-seeds: No change in performance detected.
       time:   [25.919 ms 26.940 ms 27.954 ms]
       change: [-5.5319% -0.1177% +5.0978%] (p = 0.96 > 0.05)

Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild

transfer/pacing-true/varying-seeds: No change in performance detected.
       time:   [33.663 ms 35.313 ms 36.966 ms]
       change: [-7.8338% -1.7466% +5.3688%] (p = 0.59 > 0.05)
transfer/pacing-false/same-seed: No change in performance detected.
       time:   [25.733 ms 26.417 ms 27.096 ms]
       change: [-3.2789% +0.6936% +4.8374%] (p = 0.75 > 0.05)

Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild

transfer/pacing-true/same-seed: No change in performance detected.
       time:   [40.229 ms 42.197 ms 44.216 ms]
       change: [-10.502% -4.1274% +3.0575%] (p = 0.24 > 0.05)
1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: :broken_heart: Performance has regressed.
       time:   [1.4041 s 1.4208 s 1.4345 s]
       thrpt:  [69.712 MiB/s 70.382 MiB/s 71.221 MiB/s]
change:
       time:   [+53.429% +55.980% +58.222%] (p = 0.00 -35.889% -34.823%]

Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) low severe

1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: :broken_heart: Performance has regressed.
       time:   [339.28 ms 342.87 ms 346.50 ms]
       thrpt:  [28.860 Kelem/s 29.165 Kelem/s 29.475 Kelem/s]
change:
       time:   [+6.1452% +7.7041% +9.2113%] (p = 0.00 -7.1530% -5.7895%]
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.
       time:   [33.728 ms 33.902 ms 34.093 ms]
       thrpt:  [29.332  elem/s 29.497  elem/s 29.649  elem/s]
change:
       time:   [-0.7151% +0.1566% +1.0154%] (p = 0.73 > 0.05)
       thrpt:  [-1.0052% -0.1563% +0.7203%]

Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low mild 5 (5.00%) high severe

1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: No change in performance detected.
       time:   [1.6483 s 1.6692 s 1.6908 s]
       thrpt:  [59.145 MiB/s 59.910 MiB/s 60.668 MiB/s]
change:
       time:   [-1.1292% +0.6464% +2.4030%] (p = 0.47 > 0.05)
       thrpt:  [-2.3466% -0.6422% +1.1421%]

Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing MTU Mean [ms] Min [ms] Max [ms]
gquiche gquiche 1504 552.3 ± 57.9 503.7 679.2
neqo gquiche reno on 1504 547.3 ± 71.2 488.6 689.1
neqo gquiche reno 1504 513.4 ± 38.8 487.5 619.8
neqo gquiche cubic on 1504 541.3 ± 51.3 504.8 640.3
neqo gquiche cubic 1504 531.1 ± 90.2 489.1 786.7
msquic msquic 1504 124.4 ± 52.8 92.0 307.2
neqo msquic reno on 1504 330.9 ± 73.8 254.8 492.9
neqo msquic reno 1504 382.3 ± 137.6 276.7 624.9
neqo msquic cubic on 1504 286.0 ± 15.0 253.5 305.6
neqo msquic cubic 1504 300.7 ± 69.3 247.6 470.9
gquiche neqo reno on 1504 784.3 ± 124.2 605.3 977.6
gquiche neqo reno 1504 748.5 ± 105.5 608.7 894.1
gquiche neqo cubic on 1504 769.2 ± 99.9 579.8 955.7
gquiche neqo cubic 1504 790.7 ± 137.2 615.6 1020.1
msquic neqo reno on 1504 738.5 ± 46.1 701.1 857.9
msquic neqo reno 1504 739.8 ± 49.6 702.4 850.6
msquic neqo cubic on 1504 730.1 ± 19.1 703.2 775.4
msquic neqo cubic 1504 714.9 ± 17.0 693.1 755.3
neqo neqo reno on 1504 609.3 ± 110.2 530.3 852.5
neqo neqo reno 1504 580.5 ± 103.7 521.7 800.9
neqo neqo cubic on 1504 550.9 ± 41.9 516.0 646.4
neqo neqo cubic 1504 549.3 ± 33.9 504.3 632.1

:arrow_down: Download logs

github-actions[bot] avatar Jul 02 '24 13:07 github-actions[bot]

Pulled in from main w/#1929. Let's see.

All green. Good news 😮‍💨

mxinden avatar Jul 02 '24 14:07 mxinden

1-conn/1-100mb-resp (aka. Download)/client: 💔 Performance has regressed.

   time:   [1.7535 s 1.8771 s 1.9494 s]
   thrpt:  [51.297 MiB/s 53.273 MiB/s 57.028 MiB/s]

change: time: [+52.443% +65.455% +73.749%] (p = 0.00 < 0.05) thrpt: [-42.446% -39.561% -34.402%]

Let's hold off merging for now.

mxinden avatar Jul 02 '24 15:07 mxinden

OK, making this a draft PR for now.

larseggert avatar Jul 03 '24 05:07 larseggert

Is this OBE?

larseggert avatar Nov 08 '24 17:11 larseggert

Closing here. Benchmarks still show a significant performance impact when not interleaving process_multiple_input with process_output. While likely not relevant for non-localhost traffic, I don't think the small clean-up here is worth it.

mxinden avatar Nov 17 '24 13:11 mxinden

It might be that the RX processing takes so long that the ACK clock stalls, because we don't schedule an ACK to TX.

We might want to schedule a TX when the next ACK is due.

larseggert avatar Nov 17 '24 18:11 larseggert