etcd
etcd copied to clipboard
Experiments with replacing index implementation of `btree with key -> revisions` with `revision list of immutable trees`
Recently talked with Marcel from Cillium about his recent issue with etcd being overloaded with LIST with limit. Even with low limit, the list will range all the keys to count the total number of keys (not a great feature).
My idea was to improve performance of getting key count within range by using interval trees. For now I didn't managed to implement it as I have already started seeing issues finding a good implementation of immutable tree. For now I tried with AVL trees.
The results:
name old time/op new time/op delta
IndexPutBase-12 81.7ns ± 2% 674.5ns ± 6% +725.35% (p=0.000 n=10+10)
IndexPutLongKey-12 82.4ns ± 1% 708.7ns ± 3% +760.02% (p=0.000 n=10+10)
IndexPutLargeKeySpace-12 309ns ± 1% 1931ns ± 1% +525.73% (p=0.000 n=10+8)
IndexGetBase-12 96.6ns ± 2% 57.7ns ± 3% -40.24% (p=0.000 n=10+10)
IndexGetRepeatedKeys-12 700ns ±10% 383ns ± 3% -45.24% (p=0.000 n=10+10)
IndexGetAccurate-12 98.2ns ± 5% 57.2ns ± 3% -41.74% (p=0.000 n=10+10)
IndexGetMisses-12 94.3ns ± 3% 60.0ns ± 3% -36.39% (p=0.000 n=10+10)
IndexGetLongKey-12 107ns ± 1% 67ns ± 2% -37.77% (p=0.000 n=8+9)
IndexGetLargeKeySpace-12 347ns ± 1% 1135ns ± 2% +226.70% (p=0.000 n=10+10)
IndexRangeBase-12 405ns ± 4% 512ns ± 2% +26.67% (p=0.000 n=10+8)
IndexRangeHighSelectivity-12 354ns ± 1% 452ns ± 2% +27.61% (p=0.000 n=10+10)
IndexRangeLongKey-12 655ns ± 4% 616ns ± 5% -6.00% (p=0.001 n=10+10)
IndexRangeRepeatedKeys-12 3.94µs ± 7% 0.80µs ± 6% -79.56% (p=0.000 n=9+10)
IndexRangeLargeKeySpace-12 872ns ± 1% 2072ns ± 1% +137.70% (p=0.000 n=10+10)
IndexRevisionsBase-12 296ns ± 3% 391ns ± 2% +31.90% (p=0.000 n=10+10)
IndexRevisionsHighSelectivity-12 241ns ± 1% 327ns ± 1% +35.55% (p=0.000 n=10+10)
IndexRevisionsLongKey-12 496ns ± 7% 453ns ± 5% -8.64% (p=0.000 n=10+10)
IndexRevisionsRepeatedKeys-12 35.4µs ±18% 1.3µs ± 5% -96.38% (p=0.000 n=10+10)
IndexRevisionsLargeKeySpace-12 702ns ± 1% 1901ns ± 1% +170.71% (p=0.000 n=10+10)
IndexRevisionsLimit-12 244µs ± 3% 253µs ± 2% +3.75% (p=0.000 n=10+10)
IndexCountRevisionsBase-12 219ns ± 3% 323ns ± 4% +47.27% (p=0.000 n=10+10)
IndexCountRevisionsHighSelectivity-12 168ns ± 5% 252ns ± 5% +49.64% (p=0.000 n=10+10)
IndexCountRevisionsLongKey-12 379ns ± 1% 358ns ± 6% -5.32% (p=0.002 n=8+10)
IndexCountRevisionsRepeatedKeys-12 3.97µs ±14% 0.46µs ± 7% -88.41% (p=0.000 n=10+10)
IndexCountRevisionsLargeKeySpace-12 601ns ± 1% 1804ns ± 3% +199.98% (p=0.000 n=10+10)
Pros:
- Writers don't block readers thanks to CopyOnWrite data structure
- Much better performance when there are small number of keys reused as we don't lookup in list of revisions.
- Trivial compaction thank to revision list
Cons:
- Additional allocations for Puts make them really slow (should improve with btree)
- Lower performance with large number of keys (should improve with btree)
- No ready implementation of btree and interval tree
- Not sure if tombstones can be properly implemented.
The main blocker for this approach is reducing the cost of Puts and improving the
Even if the idea doesn't stick I expect we can keep the benchmarks.
Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all
Tried the btree with copy:
name old time/op new time/op delta
IndexPutBase-12 81.7ns ± 2% 1044.5ns ±22% +1178.19% (p=0.000 n=10+10)
IndexPutLongKey-12 82.4ns ± 1% 1077.8ns ±20% +1207.90% (p=0.000 n=10+10)
IndexPutLargeKeySpace-12 309ns ± 1% 3056ns ± 6% +890.38% (p=0.000 n=10+10)
IndexGetBase-12 96.6ns ± 2% 139.7ns ± 1% +44.65% (p=0.000 n=10+10)
IndexGetRepeatedKeys-12 700ns ±10% 822ns ± 8% +17.47% (p=0.000 n=10+10)
IndexGetAccurate-12 98.2ns ± 5% 139.8ns ± 2% +42.33% (p=0.000 n=10+10)
IndexGetMisses-12 94.3ns ± 3% 141.9ns ± 2% +50.50% (p=0.000 n=10+10)
IndexGetLongKey-12 107ns ± 1% 157ns ± 3% +46.44% (p=0.000 n=8+10)
IndexGetLargeKeySpace-12 347ns ± 1% 2053ns ± 2% +491.18% (p=0.000 n=10+10)
IndexRangeBase-12 405ns ± 4% 443ns ± 3% +9.58% (p=0.000 n=10+10)
IndexRangeHighSelectivity-12 354ns ± 1% 340ns ± 2% -4.02% (p=0.000 n=10+10)
IndexRangeLongKey-12 655ns ± 4% 725ns ± 7% +10.62% (p=0.000 n=10+10)
IndexRangeRepeatedKeys-12 3.94µs ± 7% 1.05µs ± 6% -73.42% (p=0.000 n=9+10)
IndexRangeLargeKeySpace-12 872ns ± 1% 2657ns ± 2% +204.91% (p=0.000 n=10+10)
IndexRevisionsBase-12 296ns ± 3% 331ns ± 1% +11.48% (p=0.000 n=10+9)
IndexRevisionsHighSelectivity-12 241ns ± 1% 221ns ± 1% -8.57% (p=0.000 n=10+9)
IndexRevisionsLongKey-12 496ns ± 7% 571ns ± 4% +15.02% (p=0.000 n=10+10)
IndexRevisionsRepeatedKeys-12 35.4µs ±18% 1.2µs ± 3% -96.63% (p=0.000 n=10+10)
IndexRevisionsLargeKeySpace-12 702ns ± 1% 2457ns ± 2% +249.89% (p=0.000 n=10+10)
IndexRevisionsLimit-12 244µs ± 3% 240µs ± 3% ~ (p=0.123 n=10+10)
IndexCountRevisionsBase-12 219ns ± 3% 235ns ± 2% +7.01% (p=0.000 n=10+10)
IndexCountRevisionsHighSelectivity-12 168ns ± 5% 126ns ± 3% -25.26% (p=0.000 n=10+10)
IndexCountRevisionsLongKey-12 379ns ± 1% 442ns ± 2% +16.75% (p=0.000 n=8+10)
IndexCountRevisionsRepeatedKeys-12 3.97µs ±14% 0.70µs ± 6% -82.47% (p=0.000 n=10+10)
IndexCountRevisionsLargeKeySpace-12 601ns ± 1% 2290ns ± 3% +280.76% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
IndexPutBase-12 150B ± 0% 2148B ±18% +1327.24% (p=0.000 n=8+10)
IndexPutLongKey-12 151B ± 1% 2076B ±16% +1277.64% (p=0.000 n=10+10)
IndexPutLargeKeySpace-12 121B ± 0% 5125B ± 1% +4135.79% (p=0.000 n=7+10)
IndexGetBase-12 64.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
IndexGetRepeatedKeys-12 64.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
IndexGetAccurate-12 64.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
IndexGetMisses-12 64.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
IndexGetLongKey-12 64.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
IndexGetLargeKeySpace-12 64.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
IndexRangeBase-12 558B ± 6% 493B ± 6% -11.63% (p=0.000 n=10+10)
IndexRangeHighSelectivity-12 623B ± 0% 560B ± 0% -10.23% (p=0.000 n=10+10)
IndexRangeLongKey-12 558B ± 4% 480B ± 4% -13.88% (p=0.000 n=10+10)
IndexRangeRepeatedKeys-12 552B ± 6% 468B ± 7% -15.21% (p=0.000 n=10+10)
IndexRangeLargeKeySpace-12 556B ± 0% 494B ± 2% -11.11% (p=0.000 n=10+10)
IndexRevisionsBase-12 265B ± 1% 196B ± 2% -26.15% (p=0.000 n=9+8)
IndexRevisionsHighSelectivity-12 288B ± 0% 224B ± 0% -22.11% (p=0.000 n=10+8)
IndexRevisionsLongKey-12 263B ± 4% 198B ± 7% -24.41% (p=0.000 n=10+10)
IndexRevisionsRepeatedKeys-12 252B ± 5% 191B ±11% -24.37% (p=0.000 n=10+10)
IndexRevisionsLargeKeySpace-12 262B ± 1% 198B ± 1% -24.57% (p=0.000 n=10+10)
IndexRevisionsLimit-12 303B ± 0% 239B ± 0% -21.12% (p=0.002 n=8+10)
IndexCountRevisionsBase-12 64.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
IndexCountRevisionsHighSelectivity-12 64.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
IndexCountRevisionsLongKey-12 64.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
IndexCountRevisionsRepeatedKeys-12 64.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
IndexCountRevisionsLargeKeySpace-12 64.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
IndexPutBase-12 1.00 ± 0% 7.00 ± 0% +600.00% (p=0.000 n=10+10)
IndexPutLongKey-12 1.00 ± 0% 7.00 ± 0% +600.00% (p=0.000 n=10+10)
IndexPutLargeKeySpace-12 1.00 ± 0% 16.00 ± 0% +1500.00% (p=0.000 n=10+10)
IndexGetBase-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
IndexGetRepeatedKeys-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
IndexGetAccurate-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
IndexGetMisses-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
IndexGetLongKey-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
IndexGetLargeKeySpace-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
IndexRangeBase-12 7.00 ± 0% 5.70 ±12% -18.57% (p=0.000 n=9+10)
IndexRangeHighSelectivity-12 7.40 ± 8% 6.50 ± 8% -12.16% (p=0.005 n=10+10)
IndexRangeLongKey-12 7.00 ± 0% 5.70 ±12% -18.57% (p=0.000 n=10+10)
IndexRangeRepeatedKeys-12 7.40 ± 8% 6.30 ±11% -14.86% (p=0.001 n=10+10)
IndexRangeLargeKeySpace-12 5.00 ± 0% 4.00 ± 0% -20.00% (p=0.000 n=10+10)
IndexRevisionsBase-12 4.00 ± 0% 3.00 ± 0% -25.00% (p=0.000 n=9+10)
IndexRevisionsHighSelectivity-12 4.00 ± 0% 3.00 ± 0% -25.00% (p=0.000 n=10+10)
IndexRevisionsLongKey-12 4.00 ± 0% 3.00 ± 0% -25.00% (p=0.000 n=10+9)
IndexRevisionsRepeatedKeys-12 4.00 ± 0% 3.00 ± 0% -25.00% (p=0.000 n=10+10)
IndexRevisionsLargeKeySpace-12 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10)
IndexRevisionsLimit-12 4.00 ± 0% 3.00 ± 0% -25.00% (p=0.002 n=8+10)
IndexCountRevisionsBase-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
IndexCountRevisionsHighSelectivity-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
IndexCountRevisionsLongKey-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
IndexCountRevisionsRepeatedKeys-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
IndexCountRevisionsLargeKeySpace-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
Looking at btree code, looks like adding fast counting should not be hard as it already includes intervals.
Codecov Report
:x: Patch coverage is 69.13580% with 25 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 57.03%. Comparing base (58086da) to head (8674a56).
:warning: Report is 1235 commits behind head on main.
:warning: Current head 8674a56 differs from pull request most recent head 4d92c28
Please upload reports for the commit 4d92c28 to get more accurate results.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| server/storage/mvcc/index.go | 69.13% | 21 Missing and 4 partials :warning: |
Additional details and impacted files
| Files with missing lines | Coverage Δ | |
|---|---|---|
| server/storage/mvcc/kvstore.go | 57.09% <ø> (-32.78%) |
:arrow_down: |
| server/storage/mvcc/index.go | 64.10% <69.13%> (-27.57%) |
:arrow_down: |
... and 192 files with indirect coverage changes
@@ Coverage Diff @@
## main #18184 +/- ##
===========================================
- Coverage 68.84% 57.03% -11.82%
===========================================
Files 421 420 -1
Lines 35857 35604 -253
===========================================
- Hits 24686 20305 -4381
- Misses 9751 13865 +4114
- Partials 1420 1434 +14
Continue to review full report in Codecov by Sentry.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 58086da...4d92c28. Read the comment docs.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: serathius
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [serathius]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@serathius: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| pull-etcd-unit-test-arm64 | 68385008138aea2e107420c919d39dc5ee5859fd | link | true | /test pull-etcd-unit-test-arm64 |
| ci-etcd-robustness-release36-amd64 | 4d92c28ea27deb79b99f589884af8fb14a5ffe20 | link | true | /test ci-etcd-robustness-release36-amd64 |
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.