charts icon indicating copy to clipboard operation
charts copied to clipboard

goldilock last stable version controller and dashboard crash loop OOMKilled - EKS 1.21

Open aviam opened this issue 2 years ago • 2 comments

What happened?

goldilock last stable version controller and dashboard crash loop OOMKilled - EKS 1.21

What did you expect to happen?

Should work - pods should run without any crash loop - i played with resource limit - still it's crashed

How can we reproduce this?

pod automatically after helm install start to crash loop with the status OOMKilled

Version

v4.4.0

Search

  • [X] I did search for other open and closed issues before opening this.

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

Additional context

No response

aviam avatar Aug 29 '22 15:08 aviam

How did you try to remove the memory limit? Can you try just increasing it to something large?

In larger clusters, goldilocks can consume a decent amount of memory, but this highly dependent on the number of workloads.

sudermanjr avatar Sep 01 '22 20:09 sudermanjr

I had a OOMKilled crash loop too on GKE 1.23 cluster. After I increased memory limit it worked.

Maybe default requests should be changed to something else bigger? There's huge spike in memory when goldilocks runs for first time, then it continues to be low.

Also I have a suggestion to use Burstable QoS instead of Guaranteed in such case in helm values defaults.

tsutsarin avatar Sep 26 '22 11:09 tsutsarin