kustomize-controller icon indicating copy to clipboard operation
kustomize-controller copied to clipboard

kustomize-controller gets OOMKilled every hour

Open bharathvrajan opened this issue 3 months ago • 3 comments

Background:

The kustomize-controller pod is getting OOMKilled every hour or so. Its reaches around ~7.65G and gets OOM Killed as the memory limit is 8G.

  • Image - ghcr.artifactory.gcp.anz/fluxcd/kustomize-controller:v1.2.2
  • There are 184 kustomizations in total
  • Concurrency is set to 20.

These are the flags enabled:

      containers:
      - args:
        - --events-addr=http://notification-controller.flux-system.svc.cluster.local./
        - --watch-all-namespaces=true
        - --log-level=info
        - --log-encoding=json
        - --enable-leader-election
        - --concurrent=20
        - --kube-api-qps=500
        - --kube-api-burst=1000
        - --requeue-dependency=15s
        - --no-remote-bases=true
        - --feature-gates=DisableStatusPollerCache=true

Requests & Limits:

        resources:
          limits:
            memory: 8Gi
          requests:
            cpu: "1"
            memory: 8Gi

What's been tried so far:

  1. Added the flag --feature-gates=DisableStatusPollerCache=true to the kustomize-controller deployment, as mentioned in this issue - But this didn't make a difference, it still gets OOM killed in an hour.

  2. Reduced the concurrency to 5 - At this point, the pod seems stable and memory consumption is around ~2.5G

  3. Did a heap dump and the inuse_space is around ~22.64MB which is really less. Couldn't find anything useful there, but here's the link to the flamegraph. Also, here's the heap dump - heap.out.zip

  4. Checked if we have a large repository that's loading unnecessary files as mentioned in this issue

    This is from the source-controller:

    ~ $ du -sh /data/*
    6.1M	     /data/gitrepository
    824.0K     /data/helmchart
    5.8M	    /data/helmrepository
    16.0K	    /data/lost+found
    48.0K	    /data/ocirepository 
    

Want to understand what is causing the memory spike and OOM killings.

bharathvrajan avatar Mar 15 '24 01:03 bharathvrajan