fix(client): remove needless process_output in process_input
There is no need to call process_output within process_multiple_input after each GRO datagram batch. Instead, process all available incoming datagrams in process_multiple_input and then move on to the top of the loop, calling handler.handle and then process_output` as usual.
Failed Interop Tests
QUIC Interop Runner, client vs. server
neqo-latest as client
- neqo-latest vs. aioquic: Z
- neqo-latest vs. haproxy: L1
- neqo-latest vs. kwik: Z
- neqo-latest vs. msquic: A
- neqo-latest vs. mvfst: A L1 C1
- neqo-latest vs. xquic: A
neqo-latest as server
- msquic vs. neqo-latest: Z U
- mvfst vs. neqo-latest: Z A L1 C1
- ngtcp2 vs. neqo-latest: C20
- quiche vs. neqo-latest: L1 C1
- quinn vs. neqo-latest: V2
- xquic vs. neqo-latest: M
All results
Succeeded Interop Tests
QUIC Interop Runner, client vs. server
neqo-latest as client
- neqo-latest vs. aioquic: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 V2
- neqo-latest vs. go-x-net: H DC LR M B U A L2 C2 6
- neqo-latest vs. haproxy: H DC LR C20 M S R Z 3 B U A L2 C1 C2 6 V2
- neqo-latest vs. kwik: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 V2
- neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- neqo-latest vs. msquic: H DC LR C20 M S R Z B U L1 L2 C1 C2 6 V2
- neqo-latest vs. mvfst: H DC LR M R Z 3 B U L2 C2 6
- neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
- neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- neqo-latest vs. picoquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
- neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
- neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6
- neqo-latest vs. s2n-quic: H DC LR C20 M S R 3 B U E A L1 L2 C1 C2 6
- neqo-latest vs. xquic: H DC LR C20 M R Z 3 B U L1 L2 C1 C2 6
neqo-latest as server
- aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B A L1 L2 C1 C2 6 V2
- chrome vs. neqo-latest: 3
- go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6
- kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
- lsquic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6 V2
- msquic vs. neqo-latest: H DC LR C20 M S R B A L1 L2 C1 C2 6 V2
- mvfst vs. neqo-latest: H DC LR M 3 B L2 C2 6
- neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- ngtcp2 vs. neqo-latest: H DC LR M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
- quic-go vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
- quiche vs. neqo-latest: H DC LR M S R Z 3 B A L2 C2 6
- quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6
- s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6
- xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6
Unsupported Interop Tests
QUIC Interop Runner, client vs. server
neqo-latest as client
- neqo-latest vs. aioquic: E
- neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2
- neqo-latest vs. haproxy: E
- neqo-latest vs. kwik: E
- neqo-latest vs. msquic: 3 E
- neqo-latest vs. mvfst: C20 S E V2
- neqo-latest vs. nginx: E V2
- neqo-latest vs. quic-go: E V2
- neqo-latest vs. quiche: E V2
- neqo-latest vs. quinn: V2
- neqo-latest vs. s2n-quic: Z V2
- neqo-latest vs. xquic: S E V2
neqo-latest as server
- aioquic vs. neqo-latest: U E
- chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2
- go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
- kwik vs. neqo-latest: E
- lsquic vs. neqo-latest: C20 Z U
- msquic vs. neqo-latest: 3 E
- mvfst vs. neqo-latest: C20 S R U E V2
- quic-go vs. neqo-latest: E V2
- quiche vs. neqo-latest: C20 U E V2
- s2n-quic vs. neqo-latest: C20 Z U V2
- xquic vs. neqo-latest: E V2
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 95.34%. Comparing base (
f5125c6) to head (a6803e5).
Additional details and impacted files
@@ Coverage Diff @@
## main #1922 +/- ##
==========================================
- Coverage 95.37% 95.34% -0.04%
==========================================
Files 112 112
Lines 36570 36570
==========================================
- Hits 34880 34868 -12
- Misses 1690 1702 +12
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Firefox builds for this PR
The following builds are available for testing. Crossed-out builds did not succeed.
- Linux: Debug Release
- macOS: Debug Release
- Windows: Debug Release
Let's wait for a benchmark run before merging here. Just to make sure nothing depends on the additional calls to process_output.
@mxinden the RPS bench seems to just sit there idle.
@mxinden anything I can do to help land this?
I suggest we land https://github.com/mozilla/neqo/pull/1929 first. Might resolve the RPS benchmark stall.
Pulled in from main w/#1929. Let's see.
Benchmark results
Performance differences relative to f5125c6d522322a0730b69a7e860e1339746b161.
coalesce_acked_from_zero 1+1 entries: Change within noise threshold.
time: [90.366 ns 90.660 ns 90.967 ns]
change: [+0.2168% +0.5769% +0.9580%] (p = 0.00 Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) high mild
6 (6.00%) high severecoalesce_acked_from_zero 3+1 entries: :green_heart: Performance has improved.
time: [100.81 ns 101.09 ns 101.40 ns]
change: [-2.0460% -1.5509% -1.0930%] (p = 0.00 Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) low mild
2 (2.00%) high mild
12 (12.00%) high severecoalesce_acked_from_zero 10+1 entries: :green_heart: Performance has improved.
time: [100.56 ns 101.08 ns 101.67 ns]
change: [-2.3644% -1.7060% -1.0685%] (p = 0.00 Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) low mild
1 (1.00%) high mild
9 (9.00%) high severecoalesce_acked_from_zero 1000+1 entries: No change in performance detected.
time: [79.168 ns 79.289 ns 79.433 ns]
change: [-1.9308% -0.5169% +0.9605%] (p = 0.51 > 0.05)
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
RxStreamOrderer::inbound_frame(): Change within noise threshold.
time: [111.82 ms 111.87 ms 111.92 ms]
change: [-0.4649% -0.2528% -0.1101%] (p = 0.00 Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) low mild
1 (1.00%) high mildtransfer/pacing-false/varying-seeds: No change in performance detected.
time: [25.919 ms 26.940 ms 27.954 ms]
change: [-5.5319% -0.1177% +5.0978%] (p = 0.96 > 0.05)
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild
transfer/pacing-true/varying-seeds: No change in performance detected.
time: [33.663 ms 35.313 ms 36.966 ms]
change: [-7.8338% -1.7466% +5.3688%] (p = 0.59 > 0.05)
transfer/pacing-false/same-seed: No change in performance detected.
time: [25.733 ms 26.417 ms 27.096 ms]
change: [-3.2789% +0.6936% +4.8374%] (p = 0.75 > 0.05)
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
transfer/pacing-true/same-seed: No change in performance detected.
time: [40.229 ms 42.197 ms 44.216 ms]
change: [-10.502% -4.1274% +3.0575%] (p = 0.24 > 0.05)
1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: :broken_heart: Performance has regressed.
time: [1.4041 s 1.4208 s 1.4345 s]
thrpt: [69.712 MiB/s 70.382 MiB/s 71.221 MiB/s]
change:
time: [+53.429% +55.980% +58.222%] (p = 0.00 -35.889% -34.823%]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) low severe
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: :broken_heart: Performance has regressed.
time: [339.28 ms 342.87 ms 346.50 ms]
thrpt: [28.860 Kelem/s 29.165 Kelem/s 29.475 Kelem/s]
change:
time: [+6.1452% +7.7041% +9.2113%] (p = 0.00 -7.1530% -5.7895%]
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.
time: [33.728 ms 33.902 ms 34.093 ms]
thrpt: [29.332 elem/s 29.497 elem/s 29.649 elem/s]
change:
time: [-0.7151% +0.1566% +1.0154%] (p = 0.73 > 0.05)
thrpt: [-1.0052% -0.1563% +0.7203%]
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) low mild
5 (5.00%) high severe
1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: No change in performance detected.
time: [1.6483 s 1.6692 s 1.6908 s]
thrpt: [59.145 MiB/s 59.910 MiB/s 60.668 MiB/s]
change:
time: [-1.1292% +0.6464% +2.4030%] (p = 0.47 > 0.05)
thrpt: [-2.3466% -0.6422% +1.1421%]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
Client/server transfer results
Transfer of 33554432 bytes over loopback.
| Client | Server | CC | Pacing | MTU | Mean [ms] | Min [ms] | Max [ms] |
|---|---|---|---|---|---|---|---|
| gquiche | gquiche | 1504 | 552.3 ± 57.9 | 503.7 | 679.2 | ||
| neqo | gquiche | reno | on | 1504 | 547.3 ± 71.2 | 488.6 | 689.1 |
| neqo | gquiche | reno | 1504 | 513.4 ± 38.8 | 487.5 | 619.8 | |
| neqo | gquiche | cubic | on | 1504 | 541.3 ± 51.3 | 504.8 | 640.3 |
| neqo | gquiche | cubic | 1504 | 531.1 ± 90.2 | 489.1 | 786.7 | |
| msquic | msquic | 1504 | 124.4 ± 52.8 | 92.0 | 307.2 | ||
| neqo | msquic | reno | on | 1504 | 330.9 ± 73.8 | 254.8 | 492.9 |
| neqo | msquic | reno | 1504 | 382.3 ± 137.6 | 276.7 | 624.9 | |
| neqo | msquic | cubic | on | 1504 | 286.0 ± 15.0 | 253.5 | 305.6 |
| neqo | msquic | cubic | 1504 | 300.7 ± 69.3 | 247.6 | 470.9 | |
| gquiche | neqo | reno | on | 1504 | 784.3 ± 124.2 | 605.3 | 977.6 |
| gquiche | neqo | reno | 1504 | 748.5 ± 105.5 | 608.7 | 894.1 | |
| gquiche | neqo | cubic | on | 1504 | 769.2 ± 99.9 | 579.8 | 955.7 |
| gquiche | neqo | cubic | 1504 | 790.7 ± 137.2 | 615.6 | 1020.1 | |
| msquic | neqo | reno | on | 1504 | 738.5 ± 46.1 | 701.1 | 857.9 |
| msquic | neqo | reno | 1504 | 739.8 ± 49.6 | 702.4 | 850.6 | |
| msquic | neqo | cubic | on | 1504 | 730.1 ± 19.1 | 703.2 | 775.4 |
| msquic | neqo | cubic | 1504 | 714.9 ± 17.0 | 693.1 | 755.3 | |
| neqo | neqo | reno | on | 1504 | 609.3 ± 110.2 | 530.3 | 852.5 |
| neqo | neqo | reno | 1504 | 580.5 ± 103.7 | 521.7 | 800.9 | |
| neqo | neqo | cubic | on | 1504 | 550.9 ± 41.9 | 516.0 | 646.4 |
| neqo | neqo | cubic | 1504 | 549.3 ± 33.9 | 504.3 | 632.1 |
Pulled in from
mainw/#1929. Let's see.
All green. Good news 😮💨
1-conn/1-100mb-resp (aka. Download)/client: 💔 Performance has regressed.
time: [1.7535 s 1.8771 s 1.9494 s] thrpt: [51.297 MiB/s 53.273 MiB/s 57.028 MiB/s]change: time: [+52.443% +65.455% +73.749%] (p = 0.00 < 0.05) thrpt: [-42.446% -39.561% -34.402%]
Let's hold off merging for now.
OK, making this a draft PR for now.
Is this OBE?
Closing here. Benchmarks still show a significant performance impact when not interleaving process_multiple_input with process_output. While likely not relevant for non-localhost traffic, I don't think the small clean-up here is worth it.
It might be that the RX processing takes so long that the ACK clock stalls, because we don't schedule an ACK to TX.
We might want to schedule a TX when the next ACK is due.