metrics-server
metrics-server copied to clipboard
Automate testing scalability of Metrics Server.
We should mirror work done in kube-state-metrics and introduce automated scalability tests https://github.com/kubernetes/kube-state-metrics/issues/1341
Steps:
- Integrate with scalability tests (example PR https://github.com/kubernetes/perf-tests/pull/1761/files)
- Measure resource usage and request latency (example PR https://github.com/kubernetes/perf-tests/pull/1684#issuecomment-772355405)
- Deploy some dummy HPAs to put load on MS (should be discussed with scalability team)
- Document how to access and use scalability test results
/kind feature
/help
@serathius: This request has been marked as needing help from a contributor.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help
command.
In response to this:
/help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I will have a look, any others who are interested, we can discuss together.
/assign
I think the current job is : 1.Integrate with scalability tests : add manifests files to path: /kubernetes/perf-tests/clusterloader2/pkg/prometheus/manifests/exporters/kube-metrics-server/ modify file: /kubernetes/perf-tests/clusterloader2/pkg/prometheus/prometheus.go
2.Measure resource usage and request latency: add files: /kubernetes/perf-tests/clusterloader2/pkg/measurement/common/kube_metrics_server_measurement.go
Am I right? /cc @serathius @wojtek-t
We can do it together @sanwishe @lunhuijie.
metrics server is already deployed in our scalability tests - there is no need to change anything there.
What is missing is to ensure that we measure metrics reflecting its performance (i.e. add metrics-server measurement). And it's mostly about latency and things like that - resource usage we already should have these metrics (or it's super simple to add to existing resource-usage measuerements).
cc @mborsz @jkaniuk @marseel
Here is example output of resource usage on our 5k scalability test:
https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/1385277590951956480/artifacts/ResourceUsageSummary_load_2021-04-22T21:01:07Z.json
{ "Name": "metrics-server-v0.3.6-58bc6d979c-xjnq5/metrics-server", "CPU": 2.45379102, "Mem": 3847835648 },
Looks like the scalability tests deploy manifests from https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/metrics-server which run older version of Metrics Server with autoscaling enabled. Would it be possible to test latest version of Metrics Server instead of the latest release?
Let's start with having tests and then we can discuss what version. I wouldn't exclude that, but I'm also not sure if that't actually the most important thing.
@yangjunmyfm192085 Can you look into implementing the third point about measuring request latency like in kubernetes/perf-tests#1684
@yangjunmyfm192085 Can you look into implementing the third point about measuring request latency like in kubernetes/perf-tests#1684
ok,let me have a look.
@serathius We are working on this issue, and error occurs since I try to access it by this way:
# curl --cacert /etc/kubernetes/certs/ca.crt --cert /etc/kubernetes/certs/kubecfg.crt --key /etc/kubernetes/certs/kubecfg.key https://IP:PORT/api/v1/namespaces/kube-system/services/metrics-server:443/proxy/metrics
Client sent an HTTP request to an HTTPS server.
Any suggested way to get the latency of metrics-server itself?
I don't know how to use service/proxy
you are using to connect to HTTPS endpoint. Alternatives I know:
- Connect to MS directly. Requires connection to be made from cluster network so might not work here.
- Use
pods/portforward
instead. It can be done by using kubectl or writing some code.- First setup proxy
kubectl portforward -n kube-system metrics-server-pod 4443:4443 &
Then curl local portcurl localhost:4443/metrics
- Use this code https://github.com/kubernetes-sigs/metrics-server/blob/master/test/e2e_test.go#L255
- First setup proxy
Thanks,this help a lot.
Hi,@serathius @wojtek-t, we are commit pr https://github.com/kubernetes/perf-tests/pull/1797, could you please review if it works?
Done
Hi, @wojtek-t, @marseel, @mborsz, @mm4tt, the pr https://github.com/kubernetes-sigs/metrics-server/issues/710 has merged.
Do we need to discuss Deploy some dummy HPAs to put load on MS (should be discussed with scalability team)
?
ping @wojtek-t @mborsz
Hi, @wojtek-t, @marseel, @mborsz, @mm4tt, the pr #710 has merged. Do we need to discuss Deploy some dummy HPAs to put load on MS (should be discussed with scalability team)?
Deploying dummy HPAs is easy in a sense of deploying them. The two things we would like to figure out is:
- how to avoid additional significant churn in the cluster (i.e. I would like to avoid no-negligible amount of scale-up/downs triggered by HPA)
- at the same time, how to ensure that this actually useful and we check something (although from monitoring-server perspective only, maybe that's not critical)
- and avoid changing any images/work characteristic that our pods (mostly dns-related or pause pods) are doing
Also adding @jkaniuk as he was thinking about that in a different context. @tosi3k @jprzychodzen - FYI
how to avoid additional significant churn in the cluster (i.e. I would like to avoid no-negligible amount of scale-up/downs triggered by HPA)
This can be done by setting replicaCount = maxReplicaCount = minReplicaCount
, this way HPA just measures utilization, but doesn't take any action.
at the same time, how to ensure that this actually useful and we check something (although from monitoring-server perspective only, maybe that's not critical)
We can check if HPA utilization is calculated, if MS doesn't work values will not be set.
and avoid changing any images/work characteristic that our pods (mostly dns-related or pause pods) are doing.
We can use resource-consumer image maintained as part of test-infra. I propose this image as it will use non zero amount of CPU, so we can use check from second point. If not pause pods should also ok.
/cc @jkaniuk @tosi3k @jprzychodzen
@yangjunmyfm192085 are there still things to be done as part of this issue?
I think this issue has finished.
/close
@yangjunmyfm192085: Closing this issue.
In response to this:
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/cc @serathius
I don't agree with statement that this issue should be closed. In the original scope I proposed that we should document and how to run and use results from scalability tests. Without this step this work would be useless.
We need to have a way to include scalability tests in our release process, without this we just burn CPU for nothing.
ok, I think I missed this step, I will continue to research about this.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale