kustomize-controller
kustomize-controller copied to clipboard
kustomize-controller gets OOMKilled every hour
Background:
The kustomize-controller
pod is getting OOMKilled every hour or so. Its reaches around ~7.65G
and gets OOM Killed as the memory limit is 8G
.
- Image -
ghcr.artifactory.gcp.anz/fluxcd/kustomize-controller:v1.2.2
- There are
184
kustomizations in total - Concurrency is set to 20.
These are the flags enabled:
containers:
- args:
- --events-addr=http://notification-controller.flux-system.svc.cluster.local./
- --watch-all-namespaces=true
- --log-level=info
- --log-encoding=json
- --enable-leader-election
- --concurrent=20
- --kube-api-qps=500
- --kube-api-burst=1000
- --requeue-dependency=15s
- --no-remote-bases=true
- --feature-gates=DisableStatusPollerCache=true
Requests & Limits:
resources:
limits:
memory: 8Gi
requests:
cpu: "1"
memory: 8Gi
What's been tried so far:
-
Added the flag
--feature-gates=DisableStatusPollerCache=true
to the kustomize-controller deployment, as mentioned in this issue - But this didn't make a difference, it still gets OOM killed in an hour. -
Reduced the concurrency to
5
- At this point, the pod seems stable and memory consumption is around~2.5G
-
Did a heap dump and the
inuse_space
is around~22.64MB
which is really less. Couldn't find anything useful there, but here's the link to the flamegraph. Also, here's the heap dump - heap.out.zip -
Checked if we have a large repository that's loading unnecessary files as mentioned in this issue
This is from the source-controller:
~ $ du -sh /data/* 6.1M /data/gitrepository 824.0K /data/helmchart 5.8M /data/helmrepository 16.0K /data/lost+found 48.0K /data/ocirepository
Want to understand what is causing the memory spike and OOM killings.