descheduler icon indicating copy to clipboard operation
descheduler copied to clipboard

descheduler uses a lot of memory when cluster is large

Open matti opened this issue 3 years ago • 9 comments
trafficstars

What version of descheduler are you using?

descheduler version: k8s.gcr.io/descheduler/descheduler:v0.23.0

Does this issue reproduce with the latest release? yes

Which descheduler CLI options are you using? https://github.com/matti/eksler/blob/496530189c5ad82f9a7d62d4e192f83bdf7ae277/helm/charts/descheduler-1/values.yml

Please provide a copy of your descheduler policy config file see above

What k8s version are you using (kubectl version)?

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.9", GitCommit:"b631974d68ac5045e076c86a5c66fba6f128dc72", GitTreeState:"clean", BuildDate:"2022-01-19T17:51:12Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.5-eks-bc4871b", GitCommit:"5236faf39f1b7a7dabea8df12726f25608131aa9", GitTreeState:"clean", BuildDate:"2021-10-29T23:32:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

What did you do?

Installed descheduler chart v0.23.1 with values https://github.com/matti/eksler/blob/496530189c5ad82f9a7d62d4e192f83bdf7ae277/helm/charts/descheduler-1/values.yml

What did you expect to see?

Descheduler not to use tons of memory

What did you see instead?

Descheduler uses 753Mi memory with 261 nodes and 5700 running pods out of 15000 pods. the chart default value is 256Mi

$ kubectl top pod -n descheduler-1
NAME                             CPU(cores)   MEMORY(bytes)   
descheduler-1-6fdcdf644f-q2wht   91m          753Mi 

when it was left running it eventually dropped to 434Mi which is still larger than the chart default value

matti avatar Mar 20 '22 14:03 matti

hi @matti since you are use the helm charts to deploy, by default helm chart is deploy cronjob, and cronjob each time will always have to relist and no cache in memory. so for large cluster we suggest you change to deployment. for the deployment, we will cache , also you can check the test result in https://github.com/kubernetes-sigs/descheduler/pull/673#issuecomment-993115438

you can just helm uninstall , then helm install --kind=Deployment, i will waiting for your results :)

JaneLiuL avatar Mar 21 '22 12:03 JaneLiuL

I believe that the Job will still consume a lot of memory, but I'll test

matti avatar Mar 21 '22 12:03 matti

well, it has other problems: https://github.com/kubernetes-sigs/descheduler/issues/775

matti avatar Mar 21 '22 13:03 matti

@matti just to clarify, @JaneLiuL is saying that you should run it as a Deployment (not a Job or CronJob). The Deployment has a descheduling interval flag that keeps a single pod running rather than creating a new one each time.

I do agree that this will likely still face similar memory issues which we need to profile and debug. But it will be good to have the comparison just as a starting point

damemi avatar Mar 21 '22 13:03 damemi

@damemi okay but I have set it to 10s already https://github.com/matti/eksler/blob/496530189c5ad82f9a7d62d4e192f83bdf7ae277/helm/charts/descheduler-1/values.yml#L38

should it be something different?

matti avatar Mar 21 '22 14:03 matti

If you have it running as a deployment, that should give us an idea of the long-running usage, yeah. 10s is a pretty short cycle length, especially for a large cluster. But I think the point we need to focus on is figuring out what is using that much memory. If it's just the pod/node cache, I don't know if there's much we can do about that since that's a client we just import.

But it's possible (likely) that some of our strategy implementations are doing big in-memory representations of the cluster state. If you're just using the policy you linked above (https://github.com/matti/eksler/blob/496530189c5ad82f9a7d62d4e192f83bdf7ae277/helm/charts/descheduler-1/values.yml#L46-L82) then we should start there.

If you have time, maybe you could give a shot only running a policy with one of these strategies enabled at a time? Tracking the memory usage per strategy might help us narrow down what are the worst offenders. I suspect LowNodeUtilization might be a big offender since it tracks pods and nodes.

damemi avatar Mar 24 '22 19:03 damemi

Fyi I opened https://github.com/kubernetes-sigs/descheduler/issues/782 to track an effort to add performance tests so we can work on things like this.

damemi avatar Apr 13 '22 15:04 damemi

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 12 '22 16:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Aug 11 '22 17:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Sep 10 '22 17:09 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Sep 10 '22 17:09 k8s-ci-robot