kustomize-controller icon indicating copy to clipboard operation
kustomize-controller copied to clipboard

kustomize-controller high cpu usage after some time

Open siwyroot opened this issue 1 year ago • 3 comments

Describe the bug

Hello, we have a few OpenShift clusters which are configuration and apps are managed by flux. We observed a strange behaviour where kustomize-controller will start using more and more CPU and almost stop processing kustomizations. Tried suspending, modifying parameters like CPU and RAM limit, more concurrent but nothing helps. I could not find any info in logs. Could someone help me out troubleshoot this issue? I'm out of ideas.

Steps to reproduce

  1. Deploy flux
  2. Create a larger config
  3. Wait

Expected behavior

Normal CPU usage and operation

Screenshots and recordings

image

OS / Distro

Fedora 38

Flux version

0.41.2

Flux check

► checking prerequisites ✔ Kubernetes 1.21.14+a17bdb3 >=1.20.6-0 ► checking controllers ✔ helm-controller: deployment ready ► ghcr.io/fluxcd/helm-controller:v0.31.2 ✔ image-automation-controller: deployment ready ► ghcr.io/fluxcd/image-automation-controller:v0.31.0 ✔ image-reflector-controller: deployment ready ► ghcr.io/fluxcd/image-reflector-controller:v0.26.1 ✔ kustomize-controller: deployment ready ► ghcr.io/fluxcd/kustomize-controller:v0.35.1 ✔ notification-controller: deployment ready ► ghcr.io/fluxcd/notification-controller:v0.33.0 ✔ source-controller: deployment ready ► ghcr.io/fluxcd/source-controller:v0.36.1 ► checking crds ✔ alerts.notification.toolkit.fluxcd.io/v1beta2 ✔ buckets.source.toolkit.fluxcd.io/v1beta2 ✔ gitrepositories.source.toolkit.fluxcd.io/v1beta2 ✔ helmcharts.source.toolkit.fluxcd.io/v1beta2 ✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1 ✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2 ✔ imagepolicies.image.toolkit.fluxcd.io/v1beta2 ✔ imagerepositories.image.toolkit.fluxcd.io/v1beta2 ✔ imageupdateautomations.image.toolkit.fluxcd.io/v1beta1 ✔ kustomizations.kustomize.toolkit.fluxcd.io/v1beta2 ✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2 ✔ providers.notification.toolkit.fluxcd.io/v1beta2 ✔ receivers.notification.toolkit.fluxcd.io/v1beta2 ✔ all checks passed

Git provider

No response

Container Registry provider

No response

Additional context

No response

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

siwyroot avatar Jun 14 '23 15:06 siwyroot

Another graf from different cluster where problem is fluctuating a bit: image

siwyroot avatar Jun 16 '23 06:06 siwyroot

Tp find the root cause you'll need to run a CPU profile, instruction here: https://fluxcd.io/flux/gitops-toolkit/debugging/

stefanprodan avatar Jun 29 '23 08:06 stefanprodan

I'm also experiencing this issue with v2.0.1 and 81 kustomizations in k3s v1.24.10+k3s1.

@stefanprodan Does this profile contain any personal/identifiable information from the cluster (i.e. kustomization names, etc.)? I will run one next time I see it happening and can provide it to you.

ntx-ben avatar Oct 12 '23 15:10 ntx-ben

This should've been fix in Flux v2.2

stefanprodan avatar Apr 07 '24 13:04 stefanprodan