descheduler
descheduler copied to clipboard
descheduler uses a lot of memory when cluster is large
What version of descheduler are you using?
descheduler version: k8s.gcr.io/descheduler/descheduler:v0.23.0
Does this issue reproduce with the latest release? yes
Which descheduler CLI options are you using? https://github.com/matti/eksler/blob/496530189c5ad82f9a7d62d4e192f83bdf7ae277/helm/charts/descheduler-1/values.yml
Please provide a copy of your descheduler policy config file see above
What k8s version are you using (kubectl version)?
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.9", GitCommit:"b631974d68ac5045e076c86a5c66fba6f128dc72", GitTreeState:"clean", BuildDate:"2022-01-19T17:51:12Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.5-eks-bc4871b", GitCommit:"5236faf39f1b7a7dabea8df12726f25608131aa9", GitTreeState:"clean", BuildDate:"2021-10-29T23:32:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
What did you do?
Installed descheduler chart v0.23.1 with values https://github.com/matti/eksler/blob/496530189c5ad82f9a7d62d4e192f83bdf7ae277/helm/charts/descheduler-1/values.yml
What did you expect to see?
Descheduler not to use tons of memory
What did you see instead?
Descheduler uses 753Mi memory with 261 nodes and 5700 running pods out of 15000 pods. the chart default value is 256Mi
$ kubectl top pod -n descheduler-1
NAME CPU(cores) MEMORY(bytes)
descheduler-1-6fdcdf644f-q2wht 91m 753Mi
when it was left running it eventually dropped to 434Mi which is still larger than the chart default value
hi @matti since you are use the helm charts to deploy, by default helm chart is deploy cronjob, and cronjob each time will always have to relist and no cache in memory. so for large cluster we suggest you change to deployment. for the deployment, we will cache , also you can check the test result in https://github.com/kubernetes-sigs/descheduler/pull/673#issuecomment-993115438
you can just helm uninstall , then helm install --kind=Deployment, i will waiting for your results :)
I believe that the Job will still consume a lot of memory, but I'll test
well, it has other problems: https://github.com/kubernetes-sigs/descheduler/issues/775
@matti just to clarify, @JaneLiuL is saying that you should run it as a Deployment (not a Job or CronJob). The Deployment has a descheduling interval flag that keeps a single pod running rather than creating a new one each time.
I do agree that this will likely still face similar memory issues which we need to profile and debug. But it will be good to have the comparison just as a starting point
@damemi okay but I have set it to 10s already https://github.com/matti/eksler/blob/496530189c5ad82f9a7d62d4e192f83bdf7ae277/helm/charts/descheduler-1/values.yml#L38
should it be something different?
If you have it running as a deployment, that should give us an idea of the long-running usage, yeah. 10s is a pretty short cycle length, especially for a large cluster. But I think the point we need to focus on is figuring out what is using that much memory. If it's just the pod/node cache, I don't know if there's much we can do about that since that's a client we just import.
But it's possible (likely) that some of our strategy implementations are doing big in-memory representations of the cluster state. If you're just using the policy you linked above (https://github.com/matti/eksler/blob/496530189c5ad82f9a7d62d4e192f83bdf7ae277/helm/charts/descheduler-1/values.yml#L46-L82) then we should start there.
If you have time, maybe you could give a shot only running a policy with one of these strategies enabled at a time? Tracking the memory usage per strategy might help us narrow down what are the worst offenders. I suspect LowNodeUtilization might be a big offender since it tracks pods and nodes.
Fyi I opened https://github.com/kubernetes-sigs/descheduler/issues/782 to track an effort to add performance tests so we can work on things like this.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue or PR with
/reopen - Mark this issue or PR as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue or PR with
/reopen- Mark this issue or PR as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.