etcd-operator
etcd-operator copied to clipboard
etcd-operator should add status to etcdclusters/<clustername> when lost quorum
Right now etcd-operator is not updating status of etcdclusters/
Steps to reproduce:
- create etcd-operator deployment
kubectl apply -f etcd-operator.deployment.yaml
# etcd-operator.deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: etcd-operator
spec:
replicas: 1
template:
metadata:
labels:
name: etcd-operator
spec:
containers:
- name: etcd-operator
image: quay.io/coreos/etcd-operator:v0.9.4
command:
- etcd-operator
env:
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- create etcd crd from 3 nodes:
kubectl apply -f etcd-cluster.crd.yaml
# etcd-cluster.crd.yaml
apiVersion: "etcd.database.coreos.com/v1beta2"
kind: "EtcdCluster"
metadata:
name: "etcd"
spec:
size: 3
version: "3.2.13"
- wait till cluster is set up
kubectl get etcdclusters/etcd -o yaml
apiVersion: etcd.database.coreos.com/v1beta2
kind: EtcdCluster
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"etcd.database.coreos.com/v1beta2","kind":"EtcdCluster","metadata":{"annotations":{},"labels":{"etcd-operator-managed":"true"},"name":"etcd","namespace":"default"},"spec":{"size":3,"version":"3.2.13"}}
creationTimestamp: "2019-03-13T23:37:49Z"
generation: 1
labels:
etcd-operator-managed: "true"
name: etcd
namespace: default
resourceVersion: "2182831"
selfLink: /apis/etcd.database.coreos.com/v1beta2/namespaces/default/etcdclusters/etcd
uid: 037a6ceb-45e9-11e9-8b71-42010a8a000a
spec:
repository: quay.io/coreos/etcd
size: 3
version: 3.2.13
status:
clientPort: 2379
conditions:
- lastTransitionTime: "2019-03-13T23:38:30Z"
lastUpdateTime: "2019-03-13T23:38:30Z"
reason: Cluster available
status: "True"
type: Available
currentVersion: 3.2.13
members:
ready:
- etcd-6r6rpjsmtk
- etcd-r5fdrln4sh
- etcd-xkdcxc95vg
phase: Running
serviceName: etcd-client
size: 3
targetVersion: ""
kubectl get pods
NAME READY STATUS RESTARTS AGE
etcd-6r6rpjsmtk 1/1 Running 0 43s
etcd-operator-5c6bddb7f6-lxwqb 1/1 Running 0 93s
etcd-r5fdrln4sh 1/1 Running 0 27s
etcd-xkdcxc95vg 1/1 Running 0 51s
- kill 2 pods out of 3:
kubectl delete pod/etcd-6r6rpjsmtk pod/etcd-r5fdrln4sh
pod "etcd-6r6rpjsmtk" deleted
pod "etcd-r5fdrln4sh" deleted
- see etcd-operator log it reports that it lost quorum
stern etcd-operator
...
etcd-operator-5c6bddb7f6-lxwqb etcd-operator time="2019-03-13T23:41:58Z" level=info msg="cluster membership: etcd-6r6rpjsmtk,etcd-r5fdrln4sh,etcd-xkdcxc95vg" cluster-name=etcd cluster-namespace=default pkg=cluster
etcd-operator-5c6bddb7f6-lxwqb etcd-operator time="2019-03-13T23:41:58Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=default pkg=cluster
etcd-operator-5c6bddb7f6-lxwqb etcd-operator time="2019-03-13T23:41:58Z" level=error msg="failed to reconcile: lost quorum" cluster-name=etcd cluster-namespace=default pkg=cluster
- check etcdclusters/etcd
kubectl get etcdclusters/etcd -o yaml
apiVersion: etcd.database.coreos.com/v1beta2
kind: EtcdCluster
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"etcd.database.coreos.com/v1beta2","kind":"EtcdCluster","metadata":{"annotations":{},"labels":{"etcd-operator-managed":"true"},"name":"etcd","namespace":"default"},"spec":{"size":3,"version":"3.2.13"}}
creationTimestamp: "2019-03-13T23:37:49Z"
generation: 1
labels:
etcd-operator-managed: "true"
name: etcd
namespace: default
resourceVersion: "2182831"
selfLink: /apis/etcd.database.coreos.com/v1beta2/namespaces/default/etcdclusters/etcd
uid: 037a6ceb-45e9-11e9-8b71-42010a8a000a
spec:
repository: quay.io/coreos/etcd
size: 3
version: 3.2.13
status:
clientPort: 2379
conditions:
- lastTransitionTime: "2019-03-13T23:38:30Z"
lastUpdateTime: "2019-03-13T23:38:30Z"
reason: Cluster available
status: "True"
type: Available
currentVersion: 3.2.13
members:
ready:
- etcd-6r6rpjsmtk
- etcd-r5fdrln4sh
- etcd-xkdcxc95vg
phase: Running
serviceName: etcd-client
size: 3
targetVersion: ""
inspect status section.
I believe there should be an info that cluster is in bad state.
this is a duplicate of #1973 but with much better description ;)