descheduler descheduler uses a lot of memory when cluster is large

trafficstars

What version of descheduler are you using?

descheduler version: k8s.gcr.io/descheduler/descheduler:v0.23.0

Does this issue reproduce with the latest release? yes

Which descheduler CLI options are you using? https://github.com/matti/eksler/blob/496530189c5ad82f9a7d62d4e192f83bdf7ae277/helm/charts/descheduler-1/values.yml

Please provide a copy of your descheduler policy config file see above

What k8s version are you using (kubectl version)?

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.9", GitCommit:"b631974d68ac5045e076c86a5c66fba6f128dc72", GitTreeState:"clean", BuildDate:"2022-01-19T17:51:12Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.5-eks-bc4871b", GitCommit:"5236faf39f1b7a7dabea8df12726f25608131aa9", GitTreeState:"clean", BuildDate:"2021-10-29T23:32:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

What did you do?

Installed descheduler chart v0.23.1 with values https://github.com/matti/eksler/blob/496530189c5ad82f9a7d62d4e192f83bdf7ae277/helm/charts/descheduler-1/values.yml

What did you expect to see?

Descheduler not to use tons of memory

What did you see instead?

Descheduler uses 753Mi memory with 261 nodes and 5700 running pods out of 15000 pods. the chart default value is 256Mi

$ kubectl top pod -n descheduler-1
NAME                             CPU(cores)   MEMORY(bytes)   
descheduler-1-6fdcdf644f-q2wht   91m          753Mi

when it was left running it eventually dropped to 434Mi which is still larger than the chart default value

Mar 20 '22 14:03 matti

hi @matti since you are use the helm charts to deploy, by default helm chart is deploy cronjob, and cronjob each time will always have to relist and no cache in memory. so for large cluster we suggest you change to deployment. for the deployment, we will cache , also you can check the test result in https://github.com/kubernetes-sigs/descheduler/pull/673#issuecomment-993115438

you can just helm uninstall , then helm install --kind=Deployment, i will waiting for your results :)

Mar 21 '22 12:03 JaneLiuL

I believe that the Job will still consume a lot of memory, but I'll test

Mar 21 '22 12:03 matti

well, it has other problems: https://github.com/kubernetes-sigs/descheduler/issues/775

Mar 21 '22 13:03 matti

@matti just to clarify, @JaneLiuL is saying that you should run it as a Deployment (not a Job or CronJob). The Deployment has a descheduling interval flag that keeps a single pod running rather than creating a new one each time.

I do agree that this will likely still face similar memory issues which we need to profile and debug. But it will be good to have the comparison just as a starting point

Mar 21 '22 13:03 damemi

@damemi okay but I have set it to 10s already https://github.com/matti/eksler/blob/496530189c5ad82f9a7d62d4e192f83bdf7ae277/helm/charts/descheduler-1/values.yml#L38

should it be something different?

Mar 21 '22 14:03 matti

If you have it running as a deployment, that should give us an idea of the long-running usage, yeah. 10s is a pretty short cycle length, especially for a large cluster. But I think the point we need to focus on is figuring out what is using that much memory. If it's just the pod/node cache, I don't know if there's much we can do about that since that's a client we just import.

But it's possible (likely) that some of our strategy implementations are doing big in-memory representations of the cluster state. If you're just using the policy you linked above (https://github.com/matti/eksler/blob/496530189c5ad82f9a7d62d4e192f83bdf7ae277/helm/charts/descheduler-1/values.yml#L46-L82) then we should start there.

If you have time, maybe you could give a shot only running a policy with one of these strategies enabled at a time? Tracking the memory usage per strategy might help us narrow down what are the worst offenders. I suspect LowNodeUtilization might be a big offender since it tracks pods and nodes.

Mar 24 '22 19:03 damemi

Fyi I opened https://github.com/kubernetes-sigs/descheduler/issues/782 to track an effort to add performance tests so we can work on things like this.

Apr 13 '22 15:04 damemi

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jul 12 '22 16:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Aug 11 '22 17:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Sep 10 '22 17:09 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sep 10 '22 17:09 k8s-ci-robot

descheduler descheduler copied to clipboard

descheduler uses a lot of memory when cluster is large

descheduler
descheduler copied to clipboard