karmada
karmada copied to clipboard
[umbrella] Karmada proxy performance improve
We have benchmark test for proxy feature of karmada-search. This issue reports the test result, and tracks the improvement works.
Test Enviroment
- 100 clusters
- 2 millions pods
Test Records
- Cache building duration: 1m18s
- CPU cost: 6-12C

- Mem cost: 140G

-
request duration:
Name | N | Min | Median | Mean | StdDev | Max ========================================================================================= List (10clusters, 500k pods) | 3224 | 601.9ms | 787.4ms | 928ms | 344ms | 2.597s Get pod [duration] | 301855 | 1.7ms | 10ms | 9.9ms | 3.6ms | 218.5ms Update pod [duration] | 31372 | 8.5ms | 19.2ms | 47.9ms | 92.8ms | 2.0501ms -
Heap frame graph:
Raw files
Good job! Looking forward to improvements~~
Why list spend much time?
In the listing test, ResourceVersion is set to be 0, objects will be returned from local cache in Cacher.
_, err := kubeClient.CoreV1().Pods(testNamespace).List(context.TODO(), metav1.ListOptions{ResourceVersion: "0", Limit: 500})
But in cache limit doesn't work (see as below), and returns all the objects, almost 10 thousands pods in one member cluster cache, rather than 500. So it spends too much time on converting and transferring large data.
https://github.com/karmada-io/karmada/blob/614e28508336d6c03a938ce1bf0678dafef034f0/vendor/k8s.io/apiserver/pkg/storage/cacher/cacher.go#L664-L683
shall we cut the returned objects before response?
Now the behavior is stay the same with single cluster, so i don't think we shall change it.
Test again
I change the test code to below, removing the ResourceVersion option, and will get pod from member cluster apiServer.
_, err := kubeClient.CoreV1().Pods(testNamespace).List(context.TODO(), metav1.ListOptions{Limit: 500})
Test Enviroment
- 10 clusters
- 500 thousands pods
List from member cluster directly (baseline)
List pods
Name | N | Min | Median | Mean | StdDev | Max
=============================================================================
List pods [duration] | 6496 | 220.4ms | 380.3ms | 461.6ms | 270.8ms | 1.6186s
List from search
List pods
Name | N | Min | Median | Mean | StdDev | Max
=========================================================================
List pods [duration] | 3224 | 601.9ms | 787.4ms | 928ms | 344ms | 2.597s
It roughly spends twice time than from member cluster directly, as requests are sent to proxy, and then proxied to member cluster apiserver, it has longer chain.
Good job!
In the listing test,
Where is the test code?
Good job!
In the listing test,
Where is the test code?
It's still in my fork repo: https://github.com/ikaven1024/karmada/blob/benchmark/test/benchmark/search_test.go#L90
Do we need to pull the test code as well?
Do we need to pull the test code as well?
pull it in: https://github.com/karmada-io/karmada/pull/2557
It roughly spends twice time than from member cluster directly, as requests are sent to proxy, and then proxied to member cluster apiserver, it has longer chain.
Just a question: Does the list need to be obtained from the member cluster by proxy? Not cache in the control plane?
Just a question: Does the list need to be obtained from the member cluster by proxy? Not cache in the control plane?
I think we should keep the semantics of resourceVersion. When list with resourceVersion unset, query from backend, just as single cluster do.
| resourceVersion unset | resourceVersion="0" | resourceVersion="{value other than 0}" |
|---|---|---|
| Most Recent | Any | Not older than |
I think we should keep the semantics of resourceVersion. When list with resourceVersion unset, query from backend, just as single cluster do.
This is the logic of the current search code, right?
This is the logic of the current search code, right?
This logic is actually implemented in Cacher, we cannot change it.
https://github.com/karmada-io/karmada/blob/614e28508336d6c03a938ce1bf0678dafef034f0/vendor/k8s.io/apiserver/pkg/storage/cacher/cacher.go#L626-L628
Ok. I got it, thanks~
Before that, I've always had a misconception that Cacher refers to caching in the search component.