neqo fix(client): remove needless process_output in process

There is no need to call process_output within process_multiple_input after each GRO datagram batch. Instead, process all available incoming datagrams in process_multiple_input and then move on to the top of the loop, calling handler.handle and then process_output` as usual.

Jun 13 '24 07:06 mxinden

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. go-x-net: H DC LR M B U A L2 C2 6
neqo-latest vs. haproxy: H DC LR C20 M S R Z 3 B U A L2 C1 C2 6 V2
neqo-latest vs. kwik: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. msquic: H DC LR C20 M S R Z B U L1 L2 C1 C2 6 V2
neqo-latest vs. mvfst: H DC LR M R Z 3 B U L2 C2 6
neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. picoquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6
neqo-latest vs. s2n-quic: H DC LR C20 M S R 3 B U E A L1 L2 C1 C2 6
neqo-latest vs. xquic: H DC LR C20 M R Z 3 B U L1 L2 C1 C2 6

neqo-latest as server

aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B A L1 L2 C1 C2 6 V2
chrome vs. neqo-latest: 3
go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6
kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
lsquic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6 V2
msquic vs. neqo-latest: H DC LR C20 M S R B A L1 L2 C1 C2 6 V2
mvfst vs. neqo-latest: H DC LR M 3 B L2 C2 6
neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
ngtcp2 vs. neqo-latest: H DC LR M S R Z 3 B U E A L1 L2 C1 C2 6 V2
picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
quic-go vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
quiche vs. neqo-latest: H DC LR M S R Z 3 B A L2 C2 6
quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6
s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6
xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: E
neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2
neqo-latest vs. haproxy: E
neqo-latest vs. kwik: E
neqo-latest vs. msquic: 3 E
neqo-latest vs. mvfst: C20 S E V2
neqo-latest vs. nginx: E V2
neqo-latest vs. quic-go: E V2
neqo-latest vs. quiche: E V2
neqo-latest vs. quinn: V2
neqo-latest vs. s2n-quic: Z V2
neqo-latest vs. xquic: S E V2

neqo-latest as server

aioquic vs. neqo-latest: U E
chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2
go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
kwik vs. neqo-latest: E
lsquic vs. neqo-latest: C20 Z U
msquic vs. neqo-latest: 3 E
mvfst vs. neqo-latest: C20 S R U E V2
quic-go vs. neqo-latest: E V2
quiche vs. neqo-latest: C20 U E V2
s2n-quic vs. neqo-latest: C20 Z U V2
xquic vs. neqo-latest: E V2

Jun 13 '24 07:06 github-actions[bot]

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 95.34%. Comparing base (f5125c6) to head (a6803e5).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1922      +/-   ##
==========================================
- Coverage   95.37%   95.34%   -0.04%     
==========================================
  Files         112      112              
  Lines       36570    36570              
==========================================
- Hits        34880    34868      -12     
- Misses       1690     1702      +12

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jun 13 '24 07:06 codecov[bot]

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

Linux: Debug Release
macOS: Debug Release
Windows: Debug Release

Jun 13 '24 09:06 github-actions[bot]

Let's wait for a benchmark run before merging here. Just to make sure nothing depends on the additional calls to process_output.

Jun 18 '24 09:06 mxinden

@mxinden the RPS bench seems to just sit there idle.

Jun 18 '24 10:06 larseggert

@mxinden anything I can do to help land this?

Jul 01 '24 08:07 larseggert

I suggest we land https://github.com/mozilla/neqo/pull/1929 first. Might resolve the RPS benchmark stall.

Jul 01 '24 10:07 mxinden

Pulled in from main w/#1929. Let's see.

Jul 02 '24 12:07 larseggert

Benchmark results

Performance differences relative to f5125c6d522322a0730b69a7e860e1339746b161.

coalesce_acked_from_zero 1+1 entries: Change within noise threshold.

       time:   [90.366 ns 90.660 ns 90.967 ns]
       change: [+0.2168% +0.5769% +0.9580%] (p = 0.00 Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) high mild
6 (6.00%) high severe

coalesce_acked_from_zero 3+1 entries: :green_heart: Performance has improved.

       time:   [100.81 ns 101.09 ns 101.40 ns]
       change: [-2.0460% -1.5509% -1.0930%] (p = 0.00 Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) low mild
2 (2.00%) high mild
12 (12.00%) high severe

coalesce_acked_from_zero 10+1 entries: :green_heart: Performance has improved.

       time:   [100.56 ns 101.08 ns 101.67 ns]
       change: [-2.3644% -1.7060% -1.0685%] (p = 0.00 Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) low mild
1 (1.00%) high mild
9 (9.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.

       time:   [79.168 ns 79.289 ns 79.433 ns]
       change: [-1.9308% -0.5169% +0.9605%] (p = 0.51 > 0.05)
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.

       time:   [111.82 ms 111.87 ms 111.92 ms]
       change: [-0.4649% -0.2528% -0.1101%] (p = 0.00 Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) low mild
1 (1.00%) high mild

transfer/pacing-false/varying-seeds: No change in performance detected.

       time:   [25.919 ms 26.940 ms 27.954 ms]
       change: [-5.5319% -0.1177% +5.0978%] (p = 0.96 > 0.05)
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

transfer/pacing-true/varying-seeds: No change in performance detected.

       time:   [33.663 ms 35.313 ms 36.966 ms]
       change: [-7.8338% -1.7466% +5.3688%] (p = 0.59 > 0.05)

transfer/pacing-false/same-seed: No change in performance detected.

       time:   [25.733 ms 26.417 ms 27.096 ms]
       change: [-3.2789% +0.6936% +4.8374%] (p = 0.75 > 0.05)
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

transfer/pacing-true/same-seed: No change in performance detected.

       time:   [40.229 ms 42.197 ms 44.216 ms]
       change: [-10.502% -4.1274% +3.0575%] (p = 0.24 > 0.05)

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: :broken_heart: Performance has regressed.

       time:   [1.4041 s 1.4208 s 1.4345 s]
       thrpt:  [69.712 MiB/s 70.382 MiB/s 71.221 MiB/s]
change:
       time:   [+53.429% +55.980% +58.222%] (p = 0.00 -35.889% -34.823%]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) low severe

1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: :broken_heart: Performance has regressed.

       time:   [339.28 ms 342.87 ms 346.50 ms]
       thrpt:  [28.860 Kelem/s 29.165 Kelem/s 29.475 Kelem/s]
change:
       time:   [+6.1452% +7.7041% +9.2113%] (p = 0.00 -7.1530% -5.7895%]

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.

       time:   [33.728 ms 33.902 ms 34.093 ms]
       thrpt:  [29.332  elem/s 29.497  elem/s 29.649  elem/s]
change:
       time:   [-0.7151% +0.1566% +1.0154%] (p = 0.73 > 0.05)
       thrpt:  [-1.0052% -0.1563% +0.7203%]
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) low mild
5 (5.00%) high severe

1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: No change in performance detected.

       time:   [1.6483 s 1.6692 s 1.6908 s]
       thrpt:  [59.145 MiB/s 59.910 MiB/s 60.668 MiB/s]
change:
       time:   [-1.1292% +0.6464% +2.4030%] (p = 0.47 > 0.05)
       thrpt:  [-2.3466% -0.6422% +1.1421%]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client	Server	CC	Pacing	MTU	Mean [ms]	Min [ms]	Max [ms]
gquiche	gquiche			1504	552.3 ± 57.9	503.7	679.2
neqo	gquiche	reno	on	1504	547.3 ± 71.2	488.6	689.1
neqo	gquiche	reno		1504	513.4 ± 38.8	487.5	619.8
neqo	gquiche	cubic	on	1504	541.3 ± 51.3	504.8	640.3
neqo	gquiche	cubic		1504	531.1 ± 90.2	489.1	786.7
msquic	msquic			1504	124.4 ± 52.8	92.0	307.2
neqo	msquic	reno	on	1504	330.9 ± 73.8	254.8	492.9
neqo	msquic	reno		1504	382.3 ± 137.6	276.7	624.9
neqo	msquic	cubic	on	1504	286.0 ± 15.0	253.5	305.6
neqo	msquic	cubic		1504	300.7 ± 69.3	247.6	470.9
gquiche	neqo	reno	on	1504	784.3 ± 124.2	605.3	977.6
gquiche	neqo	reno		1504	748.5 ± 105.5	608.7	894.1
gquiche	neqo	cubic	on	1504	769.2 ± 99.9	579.8	955.7
gquiche	neqo	cubic		1504	790.7 ± 137.2	615.6	1020.1
msquic	neqo	reno	on	1504	738.5 ± 46.1	701.1	857.9
msquic	neqo	reno		1504	739.8 ± 49.6	702.4	850.6
msquic	neqo	cubic	on	1504	730.1 ± 19.1	703.2	775.4
msquic	neqo	cubic		1504	714.9 ± 17.0	693.1	755.3
neqo	neqo	reno	on	1504	609.3 ± 110.2	530.3	852.5
neqo	neqo	reno		1504	580.5 ± 103.7	521.7	800.9
neqo	neqo	cubic	on	1504	550.9 ± 41.9	516.0	646.4
neqo	neqo	cubic		1504	549.3 ± 33.9	504.3	632.1

:arrow_down: Download logs

Jul 02 '24 13:07 github-actions[bot]

Pulled in from main w/#1929. Let's see.

All green. Good news 😮‍💨

Jul 02 '24 14:07 mxinden

1-conn/1-100mb-resp (aka. Download)/client: 💔 Performance has regressed.
   time:   [1.7535 s 1.8771 s 1.9494 s]
   thrpt:  [51.297 MiB/s 53.273 MiB/s 57.028 MiB/s]
change: time: [+52.443% +65.455% +73.749%] (p = 0.00 < 0.05) thrpt: [-42.446% -39.561% -34.402%]

Let's hold off merging for now.

Jul 02 '24 15:07 mxinden

OK, making this a draft PR for now.

Jul 03 '24 05:07 larseggert

Is this OBE?

Nov 08 '24 17:11 larseggert

Closing here. Benchmarks still show a significant performance impact when not interleaving process_multiple_input with process_output. While likely not relevant for non-localhost traffic, I don't think the small clean-up here is worth it.

Nov 17 '24 13:11 mxinden

It might be that the RX processing takes so long that the ACK clock stalls, because we don't schedule an ACK to TX.

We might want to schedule a TX when the next ACK is due.

Nov 17 '24 18:11 larseggert

fix(client): remove needless process_output in process_input

Failed Interop Tests

neqo-latest as client

neqo-latest as server

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

Codecov Report

Firefox builds for this PR

Benchmark results

Client/server transfer results