agones icon indicating copy to clipboard operation
agones copied to clipboard

Agones Controller has OOMKilled in version 1.35.0

Open alvin-7 opened this issue 1 year ago • 2 comments

What happened: Agones controller memory usage leak to 480M and Exceeding the Kubernetes memory limit caused being OOMKilled

What you expected to happen: Agones controller memory usage stays at normal level, not spike unreasonably

How to reproduce it (as minimally and precisely as possible): Unable to provide specific steps to reproduce currently. This spike happened during normal usage.

Anything else we need to know?: Agones controller has OOMKilled

Environment:

Agones version: 1.35.0 Kubernetes version (use kubectl version): 1.22.5 Cloud provider or hardware configuration: Tencent Cloud Install method (yaml/helm): helm

image After upgrading Agones to version 1.35.0, I still encounter memory leakage issues. In my scenario, the fleet is constantly being deleted and added, with the total number remaining the same but the fleet names changing.

alvin-7 avatar Nov 02 '23 07:11 alvin-7

The previous issue is here: https://github.com/googleforgames/agones/issues/3380

alvin-7 avatar Nov 02 '23 07:11 alvin-7

If I look at the installed controller that I've installed via Helm (this is the dev version), we don't set any cpu or memory limits by default.

  agones-controller:
    Container ID:   containerd://d9507676a30507a6bbfb5f0e1ddbe24e089b59874e0b78193eb3b98dff2e9426
    Image:          us-docker.pkg.dev/agones-mark-dev/images/agones-controller:1.36.0-dev-b148d3b
    Image ID:       us-docker.pkg.dev/agones-mark-dev/images/agones-controller@sha256:16182889ecf89777799f2801ea73b183915044281a6765aeadc36482d8a8bc2b
    Ports:          8081/TCP, 8080/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Thu, 19 Oct 2023 13:19:23 -0700
    Ready:          True
    Restart Count:  0
    Limits:
      ephemeral-storage:  10100Mi
    Requests:
      ephemeral-storage:  10100Mi

It looks like you've set the memory limit to 500Mi ? This may just be too low a memory limit for the Agones controller - it can be quite memory hungry at times.

What happens if you disable the memory limit?

markmandel avatar Nov 02 '23 16:11 markmandel

If I look at the installed controller that I've installed via Helm (this is the dev version), we don't set any cpu or memory limits by default.

  agones-controller:
    Container ID:   containerd://d9507676a30507a6bbfb5f0e1ddbe24e089b59874e0b78193eb3b98dff2e9426
    Image:          us-docker.pkg.dev/agones-mark-dev/images/agones-controller:1.36.0-dev-b148d3b
    Image ID:       us-docker.pkg.dev/agones-mark-dev/images/agones-controller@sha256:16182889ecf89777799f2801ea73b183915044281a6765aeadc36482d8a8bc2b
    Ports:          8081/TCP, 8080/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Thu, 19 Oct 2023 13:19:23 -0700
    Ready:          True
    Restart Count:  0
    Limits:
      ephemeral-storage:  10100Mi
    Requests:
      ephemeral-storage:  10100Mi

It looks like you've set the memory limit to 500Mi ? This may just be too low a memory limit for the Agones controller - it can be quite memory hungry at times.

What happens if you disable the memory limit?

https://github.com/googleforgames/agones/pull/3692 This PR has already fixed the issue.

The reason is that when a deadlock occurs, the k8s event gets blocked and cannot be processed, leading to a continuous increase in cache memory, which ultimately causes an OOM (Out of Memory) issue.

alvin-7 avatar Apr 02 '24 09:04 alvin-7

Closing!

markmandel avatar Apr 02 '24 14:04 markmandel