tofu-controller icon indicating copy to clipboard operation
tofu-controller copied to clipboard

Terraform CRs stuck in Initializing status

Open pbikki opened this issue 1 year ago • 1 comments

Running v0.015.1 version.

Problem

  • We see that couple CRs are stuck in Initializing state. The only way so far we've found so far to reconcile these stuck CRs is by restarting tf-controller pods. (We see this even when CPU and memory utilization on nodes is not too high)
  • When it is stuck in Initializing state, we see no logs from tf-controller on these CRs which indicates they are not being reconciled
  • tf- controller concurrency is at 192 and qps and burst are below
 kubeAPIBurst: 1000
 kubeAPIQPS: 500
  • Are there any metrics we should be looking to understand the load on the controller why this might be happening and what we can do to alleviate the problem ?

pbikki avatar Jan 12 '24 18:01 pbikki

We have the formula to approximate the capacity here which would help:

https://github.com/flux-iac/tofu-controller/discussions/281

chanwit avatar Mar 14 '24 06:03 chanwit