agones
agones copied to clipboard
Agones Controller has OOMKilled in version 1.35.0
What happened: Agones controller memory usage leak to 480M and Exceeding the Kubernetes memory limit caused being OOMKilled
What you expected to happen: Agones controller memory usage stays at normal level, not spike unreasonably
How to reproduce it (as minimally and precisely as possible): Unable to provide specific steps to reproduce currently. This spike happened during normal usage.
Anything else we need to know?: Agones controller has OOMKilled
Environment:
Agones version: 1.35.0 Kubernetes version (use kubectl version): 1.22.5 Cloud provider or hardware configuration: Tencent Cloud Install method (yaml/helm): helm
After upgrading Agones to version 1.35.0, I still encounter memory leakage issues. In my scenario, the fleet is constantly being deleted and added, with the total number remaining the same but the fleet names changing.
The previous issue is here: https://github.com/googleforgames/agones/issues/3380
If I look at the installed controller that I've installed via Helm (this is the dev version), we don't set any cpu or memory limits by default.
agones-controller:
Container ID: containerd://d9507676a30507a6bbfb5f0e1ddbe24e089b59874e0b78193eb3b98dff2e9426
Image: us-docker.pkg.dev/agones-mark-dev/images/agones-controller:1.36.0-dev-b148d3b
Image ID: us-docker.pkg.dev/agones-mark-dev/images/agones-controller@sha256:16182889ecf89777799f2801ea73b183915044281a6765aeadc36482d8a8bc2b
Ports: 8081/TCP, 8080/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Thu, 19 Oct 2023 13:19:23 -0700
Ready: True
Restart Count: 0
Limits:
ephemeral-storage: 10100Mi
Requests:
ephemeral-storage: 10100Mi
It looks like you've set the memory limit to 500Mi ? This may just be too low a memory limit for the Agones controller - it can be quite memory hungry at times.
What happens if you disable the memory limit?
If I look at the installed controller that I've installed via Helm (this is the dev version), we don't set any cpu or memory limits by default.
agones-controller: Container ID: containerd://d9507676a30507a6bbfb5f0e1ddbe24e089b59874e0b78193eb3b98dff2e9426 Image: us-docker.pkg.dev/agones-mark-dev/images/agones-controller:1.36.0-dev-b148d3b Image ID: us-docker.pkg.dev/agones-mark-dev/images/agones-controller@sha256:16182889ecf89777799f2801ea73b183915044281a6765aeadc36482d8a8bc2b Ports: 8081/TCP, 8080/TCP Host Ports: 0/TCP, 0/TCP State: Running Started: Thu, 19 Oct 2023 13:19:23 -0700 Ready: True Restart Count: 0 Limits: ephemeral-storage: 10100Mi Requests: ephemeral-storage: 10100Mi
It looks like you've set the memory limit to 500Mi ? This may just be too low a memory limit for the Agones controller - it can be quite memory hungry at times.
What happens if you disable the memory limit?
https://github.com/googleforgames/agones/pull/3692 This PR has already fixed the issue.
The reason is that when a deadlock occurs, the k8s event gets blocked and cannot be processed, leading to a continuous increase in cache memory, which ultimately causes an OOM (Out of Memory) issue.
Closing!