etcd-operator
etcd-operator copied to clipboard
The operator gets stuck in terminating state
trafficstars
I have an issue with the operator that I am unable to reproduce consistently but it keeps happening every now and again. I have a 3-node cluster set up in DigitalOcean hosted kubernetes
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:16Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Here is my operator definition:
apiVersion: apps/v1
kind: Deployment
metadata:
name: etcd-operator
namespace: etcd
spec:
replicas: 1
selector:
matchLabels:
name: etcd-operator
strategy:
type: Recreate
template:
metadata:
labels:
name: etcd-operator
spec:
containers:
- name: etcd-operator
image: quay.io/coreos/etcd-operator:v0.9.4
command:
- etcd-operator
# Uncomment to act for resources in all namespaces. More information in doc/user/clusterwide.md
#- -cluster-wide
env:
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
resources:
limits:
cpu: 300m
memory: 200Mi
requests:
cpu: 50m
memory: 50Mi
At some point a second operator pod appears and the first one loses leader election gets stuck in Terminating state with a final log message like this:
level=fatal msg="leader election lost"
What's really strange to me is that my deployment has 2 out of 1 replicas. Any ideas why this might be happening?