neqo icon indicating copy to clipboard operation
neqo copied to clipboard

feat: Groundwork for DPLPMTUD

Open larseggert opened this issue 1 year ago • 4 comments

This lays (some of) the groundwork for implementing DPLPMTUD, by replacing a bunch of constants with functions calls into an otherwise pretty empty pmtu.rs module. Putting this PR up for early feedback, since this is pretty intrusive.

Fixes #243 (when complete)

larseggert avatar May 14 '24 15:05 larseggert

Failed Interop Tests

QUIC Interop Runner, client vs. server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

github-actions[bot] avatar May 14 '24 15:05 github-actions[bot]

(There are also a bunch of warning about unused code that is actually used. I don't understand why that is, since those functions mirror existing ones such as cwnd_avail.)

As far as I can tell the trait function CongestionControl::cwnd_min and its implementation <ClassicCongestionControl<T> as CongestionControl>::cwnd_min are only called in PacketSender::cwnd_min. PacketSender::cwnd_min is only called in testing code. Thus, cargo complains about the 3 not being used.

Does that make sense @larseggert?

mxinden avatar May 14 '24 16:05 mxinden

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

  • Linux: Debug Release
  • macOS: Debug Release
  • Windows: ~~Debug~~ ~~Release~~

github-actions[bot] avatar May 14 '24 16:05 github-actions[bot]

Benchmark results

Performance differences relative to 7d610edec8317067258911b83b535e7064caea67.

coalesce_acked_from_zero 1+1 entries: Change within noise threshold.
       time:   [193.61 ns 194.06 ns 194.53 ns]
       change: [+0.0719% +0.4048% +0.7829%] (p = 0.02 Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low mild
8 (8.00%) high mild
4 (4.00%) high severe
coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [234.67 ns 235.73 ns 237.08 ns]
       change: [+0.6006% +2.3382% +6.2565%] (p = 0.06 > 0.05)

Found 17 outliers among 100 measurements (17.00%) 11 (11.00%) high mild 6 (6.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [233.27 ns 234.03 ns 234.96 ns]
       change: [-0.1847% +0.3385% +0.8310%] (p = 0.21 > 0.05)

Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) high mild 7 (7.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [215.28 ns 217.58 ns 222.98 ns]
       change: [-0.5398% +6.4762% +20.010%] (p = 0.44 > 0.05)

Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe

RxStreamOrderer::inbound_frame(): :broken_heart: Performance has regressed.
       time:   [120.23 ms 120.29 ms 120.37 ms]
       change: [+1.1136% +1.1966% +1.2758%] (p = 0.00 Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
transfer/Run multiple transfers with varying seeds: :green_heart: Performance has improved.
       time:   [55.737 ms 59.125 ms 62.506 ms]
       thrpt:  [63.994 MiB/s 67.653 MiB/s 71.766 MiB/s]
change:
       time:   [-53.858% -51.136% -48.358%] (p = 0.00 +104.65% +116.72%]
transfer/Run multiple transfers with the same seed: :green_heart: Performance has improved.
       time:   [65.695 ms 70.817 ms 75.893 ms]
       thrpt:  [52.706 MiB/s 56.484 MiB/s 60.887 MiB/s]
change:
       time:   [-45.571% -41.490% -37.459%] (p = 0.00 +70.912% +83.726%]
1-conn/1-100mb-resp (aka. Download)/client: :green_heart: Performance has improved.
       time:   [154.81 ms 160.68 ms 166.89 ms]
       thrpt:  [599.19 MiB/s 622.35 MiB/s 645.97 MiB/s]
change:
       time:   [-86.471% -85.914% -85.283%] (p = 0.00 +609.92% +639.15%]
1-conn/10_000-parallel-1b-resp (aka. RPS)/client: Change within noise threshold.
       time:   [430.16 ms 433.60 ms 437.05 ms]
       thrpt:  [22.880 Kelem/s 23.063 Kelem/s 23.247 Kelem/s]
change:
       time:   [-2.5302% -1.4809% -0.4847%] (p = 0.00 +1.5031% +2.5959%]
1-conn/1-1b-resp (aka. HPS)/client: :green_heart: Performance has improved.
       time:   [43.569 ms 44.108 ms 44.652 ms]
       thrpt:  [22.395  elem/s 22.671  elem/s 22.952  elem/s]
change:
       time:   [-3.7644% -2.5274% -1.2515%] (p = 0.00 +2.5930% +3.9116%]

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 111.9 ± 13.4 88.9 147.8 1.00
neqo msquic reno on 267.4 ± 7.3 250.7 279.4 1.00
neqo msquic reno 271.8 ± 6.0 264.6 283.5 1.00
neqo msquic cubic on 275.0 ± 15.4 245.0 299.2 1.00
neqo msquic cubic 267.1 ± 7.8 256.6 281.1 1.00
msquic neqo reno on 235.3 ± 166.2 89.0 648.3 1.00
msquic neqo reno 140.0 ± 20.3 111.6 172.6 1.00
msquic neqo cubic on 164.5 ± 59.6 113.1 331.7 1.00
msquic neqo cubic 206.5 ± 70.2 126.4 361.7 1.00
neqo neqo reno on 192.5 ± 24.7 151.4 237.1 1.00
neqo neqo reno 173.1 ± 19.6 144.6 215.4 1.00
neqo neqo cubic on 186.7 ± 47.4 146.1 358.3 1.00
neqo neqo cubic 171.5 ± 9.3 156.5 187.0 1.00

:arrow_down: Download logs

github-actions[bot] avatar May 14 '24 17:05 github-actions[bot]

I'm not seeing PMTUD tests, which would be necessary for this.

Yeah, tests are definitely still needed.

There is probably a case for sending probes when you have spare sending capacity and nothing better to send. Indeed, successfully probing will let us push congestion windows up more and could even improve performance.

Yep. It's on my to-do list to move the probes to the end of a flight of data, if cwnd is left. The current way leads to a more predictable send pattern, which helped as I developed this.

Right now, you don't send real data in probes. You are effectively betting on the probes being lost. But you could send data, which would reduce the harm in case 3. It might even make the code slightly simpler.

https://datatracker.ietf.org/doc/html/rfc9000#name-sending-quic-pmtu-probes talks specifically about probing with PING and PADDING. I was actually wondering why that was the case, but decided to simply do that initially.

larseggert avatar May 22 '24 05:05 larseggert

On a positive note, look at the benchmark improvements! Looks like our crappy performance against msquic was mostly because we didn't send 64KB packets over loopback :-)

larseggert avatar May 22 '24 06:05 larseggert

Codecov Report

Attention: Patch coverage is 99.47849% with 4 lines in your changes missing coverage. Please review.

Project coverage is 94.97%. Comparing base (7d610ed) to head (5a1f659).

Files Patch % Lines
neqo-transport/src/pmtud.rs 99.31% 3 Missing :warning:
neqo-transport/src/sender.rs 97.05% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1903      +/-   ##
==========================================
+ Coverage   94.84%   94.97%   +0.12%     
==========================================
  Files         111      112       +1     
  Lines       35880    36509     +629     
==========================================
+ Hits        34029    34673     +644     
+ Misses       1851     1836      -15     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar May 22 '24 12:05 codecov[bot]

Making this a draft until the test failure is fixed.

larseggert avatar May 31 '24 12:05 larseggert

Working on increasing test coverage.

larseggert avatar Jul 02 '24 10:07 larseggert

This can wait until after.

larseggert avatar Jul 04 '24 06:07 larseggert