tofu-controller
tofu-controller copied to clipboard
Terraform CRs stuck in Initializing status
Running v0.015.1 version.
Problem
- We see that couple CRs are stuck in
Initializingstate. The only way so far we've found so far to reconcile these stuck CRs is by restarting tf-controller pods. (We see this even when CPU and memory utilization on nodes is not too high) - When it is stuck in
Initializingstate, we see no logs from tf-controller on these CRs which indicates they are not being reconciled - tf- controller concurrency is at 192 and qps and burst are below
kubeAPIBurst: 1000
kubeAPIQPS: 500
- Are there any metrics we should be looking to understand the load on the controller why this might be happening and what we can do to alleviate the problem ?
We have the formula to approximate the capacity here which would help:
https://github.com/flux-iac/tofu-controller/discussions/281