kubernetes
kubernetes copied to clipboard
Reimplement watch cache storage with btree
Continuing work of https://github.com/kubernetes/kubernetes/pull/108392
Started from taking the exact implementation by @MadhavJivrajani, however updated the btree factor to 32.
Using btree without any changes already is faster than existing ThreadedSafeStore. There are still many improvements that could be implemented.
Btree is awesome for watch cache due to:
- Support key ranges
- Returns ordered results
Also etcd uses btree to serve ranges too.
Performance improvement:
- 25% improvement in operations per second
- 15% less bytes allocated
goos: linux
goarch: amd64
pkg: k8s.io/apiserver/pkg/storage/cacher
cpu: AMD Ryzen 5 7600X 6-Core Processor
│ base.txt │ btree.txt │
│ sec/op │ sec/op vs base │
StoreListCreate/RV=NotOlderThan-12 698.8µ ± 4% 227.9µ ± 2% -67.39% (p=0.000 n=10)
StoreListCreate/RV=ExactMatch-12 356.4µ ± 2% 358.7µ ± 2% ~ (p=0.631 n=10)
StoreList/RV=Empty-12 982.5µ ± 6% 722.9µ ± 4% -26.43% (p=0.000 n=10)
StoreList/RV=NotOlderThan-12 972.3µ ± 2% 700.7µ ± 2% -27.93% (p=0.000 n=10)
StoreList/RV=MatchExact-12 16.55m ± 3% 16.38m ± 2% ~ (p=0.218 n=10)
StoreList/Paginate/RV=Empty-12 18.55m ± 1% 18.49m ± 1% ~ (p=0.353 n=10)
StoreList/Paginate/RV=NotOlderThan-12 18.39m ± 3% 18.45m ± 1% ~ (p=0.971 n=10)
StoreList/Paginate/RV=MatchExact-12 18.28m ± 1% 18.47m ± 1% +1.03% (p=0.003 n=10)
StoreList/NodeIndexed/RV=Empty-12 1.791m ± 3% 1.796m ± 2% ~ (p=0.796 n=10)
StoreList/NodeIndexed/RV=NotOlderThan-12 1.234m ± 4% 1.250m ± 3% ~ (p=0.796 n=10)
StoreList/NodeIndexed/RV=MatchExact-12 7.460 ± 1% 7.455 ± 1% ~ (p=0.796 n=10)
StoreList/NodeIndexed/Paginate/RV=Empty-12 70.41m ± 1% 64.66m ± 4% -8.17% (p=0.000 n=10)
StoreList/NodeIndexed/Paginate/RV=NotOlderThan-12 69.42m ± 1% 63.59m ± 1% -8.39% (p=0.000 n=10)
StoreList/NodeIndexed/Paginate/RV=MatchExact-12 8.077 ± 1% 8.088 ± 1% ~ (p=0.165 n=10)
StoreList/Namespace/RV=Empty-12 6.421m ± 1% 1.675m ± 1% -73.91% (p=0.000 n=10)
StoreList/Namespace/RV=NotOlderThan-12 5.670m ± 2% 1.069m ± 1% -81.14% (p=0.000 n=10)
StoreList/Namespace/RV=MatchExact-12 15.84m ± 2% 16.33m ± 3% +3.09% (p=0.000 n=10)
geomean 13.05m 9.789m -24.98%
│ base.txt │ btree.txt │
│ B/op │ B/op vs base │
StoreListCreate/RV=NotOlderThan-12 293.7Ki ± 4% 233.1Ki ± 4% -20.65% (p=0.000 n=10)
StoreListCreate/RV=ExactMatch-12 115.0Ki ± 0% 114.9Ki ± 0% -0.09% (p=0.000 n=10)
StoreList/RV=Empty-12 6.179Mi ± 0% 6.023Mi ± 0% -2.53% (p=0.000 n=10)
StoreList/RV=NotOlderThan-12 6.163Mi ± 0% 6.007Mi ± 0% -2.53% (p=0.000 n=10)
StoreList/RV=MatchExact-12 60.32Mi ± 0% 60.29Mi ± 0% ~ (p=0.529 n=10)
StoreList/Paginate/RV=Empty-12 64.29Mi ± 0% 64.05Mi ± 0% -0.37% (p=0.000 n=10)
StoreList/Paginate/RV=NotOlderThan-12 64.27Mi ± 0% 64.04Mi ± 0% -0.35% (p=0.000 n=10)
StoreList/Paginate/RV=MatchExact-12 63.64Mi ± 0% 63.56Mi ± 0% -0.12% (p=0.000 n=10)
StoreList/NodeIndexed/RV=Empty-12 7.928Mi ± 1% 7.929Mi ± 1% +0.01% (p=0.035 n=10)
StoreList/NodeIndexed/RV=NotOlderThan-12 6.328Mi ± 0% 6.328Mi ± 0% ~ (p=0.739 n=10)
StoreList/NodeIndexed/RV=MatchExact-12 5.785Gi ± 0% 5.782Gi ± 0% ~ (p=0.315 n=10)
StoreList/NodeIndexed/Paginate/RV=Empty-12 284.0Mi ± 0% 257.0Mi ± 0% -9.51% (p=0.000 n=10)
StoreList/NodeIndexed/Paginate/RV=NotOlderThan-12 282.2Mi ± 0% 255.3Mi ± 0% -9.55% (p=0.000 n=10)
StoreList/NodeIndexed/Paginate/RV=MatchExact-12 6.073Gi ± 0% 6.048Gi ± 0% -0.42% (p=0.000 n=10)
StoreList/Namespace/RV=Empty-12 23.745Mi ± 0% 8.123Mi ± 0% -65.79% (p=0.000 n=10)
StoreList/Namespace/RV=NotOlderThan-12 22.143Mi ± 0% 6.521Mi ± 0% -70.55% (p=0.000 n=10)
StoreList/Namespace/RV=MatchExact-12 63.39Mi ± 0% 63.32Mi ± 0% -0.11% (p=0.000 n=10)
geomean 33.76Mi 28.64Mi -15.15%
│ base.txt │ btree.txt │
│ allocs/op │ allocs/op vs base │
StoreListCreate/RV=NotOlderThan-12 561.0 ± 0% 559.0 ± 0% -0.36% (p=0.000 n=10)
StoreListCreate/RV=ExactMatch-12 874.5 ± 0% 875.0 ± 0% ~ (p=0.407 n=10)
StoreList/RV=Empty-12 346.5 ± 1% 345.0 ± 2% ~ (p=0.236 n=10)
StoreList/RV=NotOlderThan-12 88.00 ± 1% 87.00 ± 1% -1.14% (p=0.002 n=10)
StoreList/RV=MatchExact-12 460.6k ± 0% 456.8k ± 0% -0.83% (p=0.000 n=10)
StoreList/Paginate/RV=Empty-12 490.6k ± 0% 486.8k ± 0% -0.77% (p=0.000 n=10)
StoreList/Paginate/RV=NotOlderThan-12 490.4k ± 0% 486.6k ± 0% -0.78% (p=0.000 n=10)
StoreList/Paginate/RV=MatchExact-12 492.9k ± 0% 489.1k ± 0% -0.77% (p=0.000 n=10)
StoreList/NodeIndexed/RV=Empty-12 32.08k ± 0% 32.10k ± 0% ~ (p=0.061 n=10)
StoreList/NodeIndexed/RV=NotOlderThan-12 6.614k ± 1% 6.614k ± 1% ~ (p=0.642 n=10)
StoreList/NodeIndexed/RV=MatchExact-12 46.96M ± 0% 46.58M ± 0% -0.81% (p=0.000 n=10)
StoreList/NodeIndexed/Paginate/RV=Empty-12 2.147M ± 0% 1.948M ± 0% -9.28% (p=0.000 n=10)
StoreList/NodeIndexed/Paginate/RV=NotOlderThan-12 2.119M ± 0% 1.920M ± 0% -9.38% (p=0.000 n=10)
StoreList/NodeIndexed/Paginate/RV=MatchExact-12 47.93M ± 0% 47.46M ± 0% -0.97% (p=0.000 n=10)
StoreList/Namespace/RV=Empty-12 31.84k ± 0% 31.72k ± 0% -0.39% (p=0.000 n=10)
StoreList/Namespace/RV=NotOlderThan-12 6.314k ± 0% 6.218k ± 0% -1.53% (p=0.000 n=10)
StoreList/Namespace/RV=MatchExact-12 490.1k ± 0% 486.4k ± 0% -0.76% (p=0.000 n=10)
geomean 78.82k 77.48k -1.69%
/kind feature
Use btree data structure for watch cache
👍 really great work!
there is another https://github.com/kubernetes/kubernetes/pull/126753 seems duplicated? :)
/retest
Oh, found that original PoC had very inefficient implementation of Get (https://github.com/kubernetes/kubernetes/pull/108392/files#diff-7dbc4a86989f847d364c09b2370266542ccbb72edc66d0f346559771b1445e56R200-R211) that scanned whole keyspace. Fixed
Tests fails, in last change I must have introduced a bug.
/triage accepted
ready to review, ping @deads2k @jpbetz
ping @deads2k
I'd really like to see if @wojtek-t can review changes to the watch cache.
It's on my list and I definitely would like to review this PR - just didn't yet get to it.
Just storage implementation using btree and tests. Removed integration to watch cache to avoid too many approvals on feature flags.
Got to agreement about the indexer code. PTAL @jpbetz @deads2k @wojtek-t
/lgtm /approve
LGTM label has been added.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: serathius, wojtek-t
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~staging/src/k8s.io/apiserver/OWNERS~~ [wojtek-t]
- ~~staging/src/k8s.io/cloud-provider/OWNERS~~ [wojtek-t]
- ~~staging/src/k8s.io/kube-aggregator/OWNERS~~ [wojtek-t]
- ~~staging/src/k8s.io/kube-controller-manager/OWNERS~~ [wojtek-t]
- ~~staging/src/k8s.io/pod-security-admission/OWNERS~~ [wojtek-t]
- ~~staging/src/k8s.io/sample-apiserver/OWNERS~~ [wojtek-t]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Fixed typo caught by verify.
/lgtm
LGTM label has been added.
@serathius can we reflect what you mentioned in an earlier comment in the release notes? It currently says Use btree data structure for watch cache but should it be New implementation of watch cache using btree data structure. Implementation is not enabled yet. (or something equivalent?).
PS: probably need to do the same for the PR title too.
@dims Done
thanks @serathius ! will avoid unnecessary questions :)
Changelog suggestion
-New implementation of watch cache using btree data structure. Implementation is not enabled yet.
If it's not enabled, I wouldn't changelog it.