etcd icon indicating copy to clipboard operation
etcd copied to clipboard

Experiments with replacing index implementation of `btree with key -> revisions` with `revision list of immutable trees`

Open serathius opened this issue 1 year ago • 6 comments

Recently talked with Marcel from Cillium about his recent issue with etcd being overloaded with LIST with limit. Even with low limit, the list will range all the keys to count the total number of keys (not a great feature).

My idea was to improve performance of getting key count within range by using interval trees. For now I didn't managed to implement it as I have already started seeing issues finding a good implementation of immutable tree. For now I tried with AVL trees.

The results:

name                                   old time/op    new time/op    delta
IndexPutBase-12                          81.7ns ± 2%   674.5ns ± 6%   +725.35%  (p=0.000 n=10+10)
IndexPutLongKey-12                       82.4ns ± 1%   708.7ns ± 3%   +760.02%  (p=0.000 n=10+10)
IndexPutLargeKeySpace-12                  309ns ± 1%    1931ns ± 1%   +525.73%  (p=0.000 n=10+8)
IndexGetBase-12                          96.6ns ± 2%    57.7ns ± 3%    -40.24%  (p=0.000 n=10+10)
IndexGetRepeatedKeys-12                   700ns ±10%     383ns ± 3%    -45.24%  (p=0.000 n=10+10)
IndexGetAccurate-12                      98.2ns ± 5%    57.2ns ± 3%    -41.74%  (p=0.000 n=10+10)
IndexGetMisses-12                        94.3ns ± 3%    60.0ns ± 3%    -36.39%  (p=0.000 n=10+10)
IndexGetLongKey-12                        107ns ± 1%      67ns ± 2%    -37.77%  (p=0.000 n=8+9)
IndexGetLargeKeySpace-12                  347ns ± 1%    1135ns ± 2%   +226.70%  (p=0.000 n=10+10)
IndexRangeBase-12                         405ns ± 4%     512ns ± 2%    +26.67%  (p=0.000 n=10+8)
IndexRangeHighSelectivity-12              354ns ± 1%     452ns ± 2%    +27.61%  (p=0.000 n=10+10)
IndexRangeLongKey-12                      655ns ± 4%     616ns ± 5%     -6.00%  (p=0.001 n=10+10)
IndexRangeRepeatedKeys-12                3.94µs ± 7%    0.80µs ± 6%    -79.56%  (p=0.000 n=9+10)
IndexRangeLargeKeySpace-12                872ns ± 1%    2072ns ± 1%   +137.70%  (p=0.000 n=10+10)
IndexRevisionsBase-12                     296ns ± 3%     391ns ± 2%    +31.90%  (p=0.000 n=10+10)
IndexRevisionsHighSelectivity-12          241ns ± 1%     327ns ± 1%    +35.55%  (p=0.000 n=10+10)
IndexRevisionsLongKey-12                  496ns ± 7%     453ns ± 5%     -8.64%  (p=0.000 n=10+10)
IndexRevisionsRepeatedKeys-12            35.4µs ±18%     1.3µs ± 5%    -96.38%  (p=0.000 n=10+10)
IndexRevisionsLargeKeySpace-12            702ns ± 1%    1901ns ± 1%   +170.71%  (p=0.000 n=10+10)
IndexRevisionsLimit-12                    244µs ± 3%     253µs ± 2%     +3.75%  (p=0.000 n=10+10)
IndexCountRevisionsBase-12                219ns ± 3%     323ns ± 4%    +47.27%  (p=0.000 n=10+10)
IndexCountRevisionsHighSelectivity-12     168ns ± 5%     252ns ± 5%    +49.64%  (p=0.000 n=10+10)
IndexCountRevisionsLongKey-12             379ns ± 1%     358ns ± 6%     -5.32%  (p=0.002 n=8+10)
IndexCountRevisionsRepeatedKeys-12       3.97µs ±14%    0.46µs ± 7%    -88.41%  (p=0.000 n=10+10)
IndexCountRevisionsLargeKeySpace-12       601ns ± 1%    1804ns ± 3%   +199.98%  (p=0.000 n=10+10)

Pros:

  • Writers don't block readers thanks to CopyOnWrite data structure
  • Much better performance when there are small number of keys reused as we don't lookup in list of revisions.
  • Trivial compaction thank to revision list

Cons:

  • Additional allocations for Puts make them really slow (should improve with btree)
  • Lower performance with large number of keys (should improve with btree)
  • No ready implementation of btree and interval tree
  • Not sure if tombstones can be properly implemented.

The main blocker for this approach is reducing the cost of Puts and improving the

Even if the idea doesn't stick I expect we can keep the benchmarks.

serathius avatar Jun 16 '24 17:06 serathius

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

k8s-ci-robot avatar Jun 16 '24 17:06 k8s-ci-robot

Tried the btree with copy:

name                                   old time/op    new time/op    delta
IndexPutBase-12                          81.7ns ± 2%  1044.5ns ±22%  +1178.19%  (p=0.000 n=10+10)
IndexPutLongKey-12                       82.4ns ± 1%  1077.8ns ±20%  +1207.90%  (p=0.000 n=10+10)
IndexPutLargeKeySpace-12                  309ns ± 1%    3056ns ± 6%   +890.38%  (p=0.000 n=10+10)
IndexGetBase-12                          96.6ns ± 2%   139.7ns ± 1%    +44.65%  (p=0.000 n=10+10)
IndexGetRepeatedKeys-12                   700ns ±10%     822ns ± 8%    +17.47%  (p=0.000 n=10+10)
IndexGetAccurate-12                      98.2ns ± 5%   139.8ns ± 2%    +42.33%  (p=0.000 n=10+10)
IndexGetMisses-12                        94.3ns ± 3%   141.9ns ± 2%    +50.50%  (p=0.000 n=10+10)
IndexGetLongKey-12                        107ns ± 1%     157ns ± 3%    +46.44%  (p=0.000 n=8+10)
IndexGetLargeKeySpace-12                  347ns ± 1%    2053ns ± 2%   +491.18%  (p=0.000 n=10+10)
IndexRangeBase-12                         405ns ± 4%     443ns ± 3%     +9.58%  (p=0.000 n=10+10)
IndexRangeHighSelectivity-12              354ns ± 1%     340ns ± 2%     -4.02%  (p=0.000 n=10+10)
IndexRangeLongKey-12                      655ns ± 4%     725ns ± 7%    +10.62%  (p=0.000 n=10+10)
IndexRangeRepeatedKeys-12                3.94µs ± 7%    1.05µs ± 6%    -73.42%  (p=0.000 n=9+10)
IndexRangeLargeKeySpace-12                872ns ± 1%    2657ns ± 2%   +204.91%  (p=0.000 n=10+10)
IndexRevisionsBase-12                     296ns ± 3%     331ns ± 1%    +11.48%  (p=0.000 n=10+9)
IndexRevisionsHighSelectivity-12          241ns ± 1%     221ns ± 1%     -8.57%  (p=0.000 n=10+9)
IndexRevisionsLongKey-12                  496ns ± 7%     571ns ± 4%    +15.02%  (p=0.000 n=10+10)
IndexRevisionsRepeatedKeys-12            35.4µs ±18%     1.2µs ± 3%    -96.63%  (p=0.000 n=10+10)
IndexRevisionsLargeKeySpace-12            702ns ± 1%    2457ns ± 2%   +249.89%  (p=0.000 n=10+10)
IndexRevisionsLimit-12                    244µs ± 3%     240µs ± 3%       ~     (p=0.123 n=10+10)
IndexCountRevisionsBase-12                219ns ± 3%     235ns ± 2%     +7.01%  (p=0.000 n=10+10)
IndexCountRevisionsHighSelectivity-12     168ns ± 5%     126ns ± 3%    -25.26%  (p=0.000 n=10+10)
IndexCountRevisionsLongKey-12             379ns ± 1%     442ns ± 2%    +16.75%  (p=0.000 n=8+10)
IndexCountRevisionsRepeatedKeys-12       3.97µs ±14%    0.70µs ± 6%    -82.47%  (p=0.000 n=10+10)
IndexCountRevisionsLargeKeySpace-12       601ns ± 1%    2290ns ± 3%   +280.76%  (p=0.000 n=10+10)

name                                   old alloc/op   new alloc/op   delta
IndexPutBase-12                            150B ± 0%     2148B ±18%  +1327.24%  (p=0.000 n=8+10)
IndexPutLongKey-12                         151B ± 1%     2076B ±16%  +1277.64%  (p=0.000 n=10+10)
IndexPutLargeKeySpace-12                   121B ± 0%     5125B ± 1%  +4135.79%  (p=0.000 n=7+10)
IndexGetBase-12                           64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexGetRepeatedKeys-12                   64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexGetAccurate-12                       64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexGetMisses-12                         64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexGetLongKey-12                        64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexGetLargeKeySpace-12                  64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexRangeBase-12                          558B ± 6%      493B ± 6%    -11.63%  (p=0.000 n=10+10)
IndexRangeHighSelectivity-12               623B ± 0%      560B ± 0%    -10.23%  (p=0.000 n=10+10)
IndexRangeLongKey-12                       558B ± 4%      480B ± 4%    -13.88%  (p=0.000 n=10+10)
IndexRangeRepeatedKeys-12                  552B ± 6%      468B ± 7%    -15.21%  (p=0.000 n=10+10)
IndexRangeLargeKeySpace-12                 556B ± 0%      494B ± 2%    -11.11%  (p=0.000 n=10+10)
IndexRevisionsBase-12                      265B ± 1%      196B ± 2%    -26.15%  (p=0.000 n=9+8)
IndexRevisionsHighSelectivity-12           288B ± 0%      224B ± 0%    -22.11%  (p=0.000 n=10+8)
IndexRevisionsLongKey-12                   263B ± 4%      198B ± 7%    -24.41%  (p=0.000 n=10+10)
IndexRevisionsRepeatedKeys-12              252B ± 5%      191B ±11%    -24.37%  (p=0.000 n=10+10)
IndexRevisionsLargeKeySpace-12             262B ± 1%      198B ± 1%    -24.57%  (p=0.000 n=10+10)
IndexRevisionsLimit-12                     303B ± 0%      239B ± 0%    -21.12%  (p=0.002 n=8+10)
IndexCountRevisionsBase-12                64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsHighSelectivity-12     64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsLongKey-12             64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsRepeatedKeys-12        64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsLargeKeySpace-12       64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)

name                                   old allocs/op  new allocs/op  delta
IndexPutBase-12                            1.00 ± 0%      7.00 ± 0%   +600.00%  (p=0.000 n=10+10)
IndexPutLongKey-12                         1.00 ± 0%      7.00 ± 0%   +600.00%  (p=0.000 n=10+10)
IndexPutLargeKeySpace-12                   1.00 ± 0%     16.00 ± 0%  +1500.00%  (p=0.000 n=10+10)
IndexGetBase-12                            1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexGetRepeatedKeys-12                    1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexGetAccurate-12                        1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexGetMisses-12                          1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexGetLongKey-12                         1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexGetLargeKeySpace-12                   1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexRangeBase-12                          7.00 ± 0%      5.70 ±12%    -18.57%  (p=0.000 n=9+10)
IndexRangeHighSelectivity-12               7.40 ± 8%      6.50 ± 8%    -12.16%  (p=0.005 n=10+10)
IndexRangeLongKey-12                       7.00 ± 0%      5.70 ±12%    -18.57%  (p=0.000 n=10+10)
IndexRangeRepeatedKeys-12                  7.40 ± 8%      6.30 ±11%    -14.86%  (p=0.001 n=10+10)
IndexRangeLargeKeySpace-12                 5.00 ± 0%      4.00 ± 0%    -20.00%  (p=0.000 n=10+10)
IndexRevisionsBase-12                      4.00 ± 0%      3.00 ± 0%    -25.00%  (p=0.000 n=9+10)
IndexRevisionsHighSelectivity-12           4.00 ± 0%      3.00 ± 0%    -25.00%  (p=0.000 n=10+10)
IndexRevisionsLongKey-12                   4.00 ± 0%      3.00 ± 0%    -25.00%  (p=0.000 n=10+9)
IndexRevisionsRepeatedKeys-12              4.00 ± 0%      3.00 ± 0%    -25.00%  (p=0.000 n=10+10)
IndexRevisionsLargeKeySpace-12             3.00 ± 0%      2.00 ± 0%    -33.33%  (p=0.000 n=10+10)
IndexRevisionsLimit-12                     4.00 ± 0%      3.00 ± 0%    -25.00%  (p=0.002 n=8+10)
IndexCountRevisionsBase-12                 1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsHighSelectivity-12      1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsLongKey-12              1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsRepeatedKeys-12         1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsLargeKeySpace-12        1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)

serathius avatar Jun 16 '24 18:06 serathius

Looking at btree code, looks like adding fast counting should not be hard as it already includes intervals.

serathius avatar Jun 16 '24 18:06 serathius

Codecov Report

:x: Patch coverage is 69.13580% with 25 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 57.03%. Comparing base (58086da) to head (8674a56). :warning: Report is 1235 commits behind head on main.

:warning: Current head 8674a56 differs from pull request most recent head 4d92c28

Please upload reports for the commit 4d92c28 to get more accurate results.

Files with missing lines Patch % Lines
server/storage/mvcc/index.go 69.13% 21 Missing and 4 partials :warning:
Additional details and impacted files
Files with missing lines Coverage Δ
server/storage/mvcc/kvstore.go 57.09% <ø> (-32.78%) :arrow_down:
server/storage/mvcc/index.go 64.10% <69.13%> (-27.57%) :arrow_down:

... and 192 files with indirect coverage changes

@@             Coverage Diff             @@
##             main   #18184       +/-   ##
===========================================
- Coverage   68.84%   57.03%   -11.82%     
===========================================
  Files         421      420        -1     
  Lines       35857    35604      -253     
===========================================
- Hits        24686    20305     -4381     
- Misses       9751    13865     +4114     
- Partials     1420     1434       +14     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 58086da...4d92c28. Read the comment docs.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Jun 16 '24 20:06 codecov-commenter

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Dec 06 '24 20:12 k8s-ci-robot

@serathius: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-unit-test-arm64 68385008138aea2e107420c919d39dc5ee5859fd link true /test pull-etcd-unit-test-arm64
ci-etcd-robustness-release36-amd64 4d92c28ea27deb79b99f589884af8fb14a5ffe20 link true /test ci-etcd-robustness-release36-amd64

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

k8s-ci-robot avatar May 24 '25 18:05 k8s-ci-robot

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jul 24 '25 00:07 github-actions[bot]

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Sep 25 '25 00:09 github-actions[bot]

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Nov 26 '25 00:11 github-actions[bot]