neqo
neqo copied to clipboard
Track sent items using SmallVec
DO NOT MERGE THIS - IT MAKES THINGS SLOWER
I thought that this was worth sharing though.
I was looking at the profiles that @KershawChang shared and noticed that write_stream_frame was eating a decent amount of CPU because there was an allocation in there. It took me a while to track it down, but it turns out that this is the first place that we add a RecoveryToken to the Vec of those that we hold in each SentPacket.
The obvious thing to test in this case is SmallVec. SmallVec is designed for places where you mostly add a few items to a collection. It holds a small array into which values are deposited unless more than the array size is added, when those values are allocated on the heap, just like a normal Vec. This means that if you stay within the array, you avoid a heap allocation, which can save a bunch of work. The cost is a little complexity, the extra stack allocation, and that your stuff needs to move if you ever overflow the backing array. So if your prediction is wrong, it is more expensive.
I tried this out and it makes performance worse. With a backing array of size 1, the performance change isn't observable. But larger values progressively degrade performance. Increasing the array size to 8 makes it worse than the value of 4 I started with. My theory is that the increased size of the base object starts is the problem. This collection is typically kept in a SentPacket. Our acknowledgements code needs extra time to move the larger struct around.
The size of the performance regression suggests that there are some ways we could further improve how SentPacket is handled.
Benchmark results
Run multiple transfers with varying seeds
time: [42.395 ms 42.547 ms 42.712 ms]
change: [+3.3754% +4.3870% +5.2776%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
Run multiple transfers with the same seed
time: [42.513 ms 42.584 ms 42.657 ms]
change: [+3.4443% +3.7777% +4.0976%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high mild
Codecov Report
Attention: Patch coverage is 99.09910% with 1 lines in your changes are missing coverage. Please review.
Project coverage is 92.93%. Comparing base (
76630a5) to head (809926f).
:exclamation: Current head 809926f differs from pull request most recent head 167816c. Consider uploading reports for the commit 167816c to get more accurate results
| Files | Patch % | Lines |
|---|---|---|
| neqo-transport/src/send_stream.rs | 92.30% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #1657 +/- ##
==========================================
- Coverage 93.08% 92.93% -0.15%
==========================================
Files 117 119 +2
Lines 36422 37263 +841
==========================================
+ Hits 33903 34632 +729
- Misses 2519 2631 +112
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@larseggert, I'm looking at the profiles that were generated here and it looks like these are off the main branch. The flamegraph for transfer shows calls into BTreeMap, but I don't see any way that could happen.
I can't get to it this week. Wonder if the checkout action needs another parameter?
Let's see what the new flamegraphs show (also because LTO is now enabled.)
Benchmark results
Performance differences relative to 76630a5ebb6c6b94de6a40cf3f439b9a846f6ab7.
-
drain a timer quickly time: [395.29 ns 402.55 ns 409.33 ns] change: [-0.6787% +0.9859% +2.7112%] (p = 0.26 > 0.05) No change in performance detected.
-
coalesce_acked_from_zero 1+1 entries time: [193.97 ns 194.36 ns 194.79 ns] change: [-0.5237% -0.1538% +0.2215%] (p = 0.41 > 0.05) No change in performance detected.
-
coalesce_acked_from_zero 3+1 entries time: [235.93 ns 236.55 ns 237.18 ns] change: [-0.8063% -0.4103% -0.0496%] (p = 0.04 < 0.05) Change within noise threshold.
-
coalesce_acked_from_zero 10+1 entries time: [235.78 ns 236.59 ns 237.55 ns] change: [-0.4030% +3.6386% +13.911%] (p = 0.42 > 0.05) No change in performance detected.
-
coalesce_acked_from_zero 1000+1 entries time: [217.27 ns 222.01 ns 232.88 ns] change: [+1.4742% +5.6444% +13.185%] (p = 0.06 > 0.05) No change in performance detected.
-
RxStreamOrderer::inbound_frame() time: [117.57 ms 117.65 ms 117.73 ms] change: [+0.0927% +0.1982% +0.2949%] (p = 0.00 < 0.05) Change within noise threshold.
-
transfer/Run multiple transfers with varying seeds time: [124.35 ms 124.61 ms 124.88 ms] thrpt: [32.031 MiB/s 32.099 MiB/s 32.169 MiB/s] change: time: [+5.4576% +5.7824% +6.1037%] (p = 0.00 < 0.05) thrpt: [-5.7526% -5.4664% -5.1751%] :broken_heart: Performance has regressed.
-
transfer/Run multiple transfers with the same seed time: [124.82 ms 125.00 ms 125.18 ms] thrpt: [31.953 MiB/s 32.000 MiB/s 32.047 MiB/s] change: time: [+5.2953% +5.5022% +5.6981%] (p = 0.00 < 0.05) thrpt: [-5.3909% -5.2152% -5.0290%] :broken_heart: Performance has regressed.
Client/server transfer results
Transfer of 134217728 bytes over loopback.
| Client | Server | CC | Pacing | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|---|---|---|
| msquic | msquic | 428.3 ± 58.8 | 359.7 | 538.1 | 1.00 | ||
| neqo | msquic | reno | on | 1921.6 ± 35.0 | 1850.3 | 1960.9 | 1.00 |
| neqo | msquic | reno | 1917.7 ± 30.3 | 1857.9 | 1959.7 | 1.00 | |
| neqo | msquic | cubic | on | 1898.8 ± 56.6 | 1775.9 | 1971.2 | 1.00 |
| neqo | msquic | cubic | 1901.9 ± 46.7 | 1811.5 | 1960.3 | 1.00 | |
| msquic | neqo | reno | on | 4418.4 ± 170.0 | 4222.4 | 4692.8 | 1.00 |
| msquic | neqo | reno | 4510.8 ± 232.7 | 4225.0 | 4969.3 | 1.00 | |
| msquic | neqo | cubic | on | 4475.1 ± 154.6 | 4259.7 | 4672.4 | 1.00 |
| msquic | neqo | cubic | 4434.9 ± 168.7 | 4204.7 | 4705.9 | 1.00 | |
| neqo | neqo | reno | on | 3664.9 ± 172.2 | 3387.1 | 3957.8 | 1.00 |
| neqo | neqo | reno | 3805.0 ± 275.7 | 3529.5 | 4374.1 | 1.00 | |
| neqo | neqo | cubic | on | 4329.7 ± 141.6 | 4114.5 | 4523.6 | 1.00 |
| neqo | neqo | cubic | 4372.4 ± 141.9 | 4184.3 | 4553.5 | 1.00 |