dlrover
dlrover copied to clipboard
The controller manager restarts frequently
I started the controller manager applying dlrover/go/operator/config/manifests/bases/deployment.yaml, but found that it restarts frequently.
[root@vadmin14 ~]# kubectl -n dlrover get po
NAME READY STATUS RESTARTS AGE
dlrover-brain-5b866c8c44-n9cjp 1/1 Running 0 27h
dlrover-controller-manager-5884d84c4d-lz8th 2/2 Running 30 (2m50s ago) 27h
dlrover-kube-monitor-67c4ccf78d-lwmfv 1/1 Running 0 27h
mysql-6877845b96-j8sbg 1/1 Running 0 27h
view logs:
[root@vadmin14 ~]# kubectl -n dlrover logs dlrover-controller-manager-5884d84c4d-lz8th -f
... ...
E1025 06:38:52.463386 1 leaderelection.go:330] error retrieving resource lock dlrover/9b6611a4.iml.github.io: Get "https://10.66.0.1:443/apis/coordination.k8s.io/v1/namespaces/dlrover/leases/9b6611a4.iml.github.io": context deadline exceeded
I1025 06:38:52.463492 1 leaderelection.go:283] failed to renew lease dlrover/9b6611a4.iml.github.io: timed out waiting for the condition
1.729838332463565e+09 ERROR setup problem running manager {"error": "leader election lost"}
main.main
/workspace/main.go:119
runtime.main
/usr/local/go/src/runtime/proc.go:250
1.7298383324636865e+09 INFO Stopping and waiting for non leader election runnables
When I set leader-elect to false, the controller manager stopped restarting.
[root@vadmin14 ~]# kubectl -n dlrover edit deployments.apps dlrover-controller-manager
... ...
spec:
replicas: 1
... ...
- args:
- --health-probe-bind-address=:8081
- --metrics-bind-address=127.0.0.1:8080
- --leader-elect=false
... ...
So, why set leader-elect to true when the number of controller manager replicas is 1?
Maybe we need to increase the LeaseDuration or RenewDeadline like the issue
https://github.com/operator-framework/operator-sdk/issues/1813#issuecomment-523713555