gcs icon indicating copy to clipboard operation
gcs copied to clipboard

after the cluster is restarted the ETCD pods goes in ERROR state

Open ksandha opened this issue 6 years ago • 1 comments

cold reset of the cluster leads to ETCD pods going in ERROR state.

  1. create a GCS cluster
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME                                   READY   STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running   0          20h
csi-nodeplugin-glusterfsplugin-dhzbh   2/2     Running   0          20h
csi-nodeplugin-glusterfsplugin-l54d4   2/2     Running   0          20h
csi-nodeplugin-glusterfsplugin-ww55c   2/2     Running   0          20h
csi-provisioner-glusterfsplugin-0      3/3     Running   0          20h
etcd-64t8sjpxvw                        1/1     Running   0          20h
etcd-bg9zcfvbl2                        1/1     Running   0          20h
etcd-operator-7cb5bd459b-rlwqx         1/1     Running   0          20h
etcd-q9skwdlmmb                        1/1     Running   0          20h
gluster-kube1-0                        1/1     Running   1          20h
gluster-kube2-0                        1/1     Running   1          20h
gluster-kube3-0                        1/1     Running   1          20h
[vagrant@kube1 ~]$ 
  1. Cold reset the cluster
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME                                   READY   STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running   2          20h
csi-nodeplugin-glusterfsplugin-dhzbh   2/2     Running   2          20h
csi-nodeplugin-glusterfsplugin-l54d4   2/2     Running   2          20h
csi-nodeplugin-glusterfsplugin-ww55c   2/2     Running   2          20h
csi-provisioner-glusterfsplugin-0      3/3     Running   4          20h
etcd-64t8sjpxvw                        0/1     Error     0          20h
etcd-bg9zcfvbl2                        0/1     Error     0          20h
etcd-operator-7cb5bd459b-rlwqx         1/1     Running   1          20h
etcd-q9skwdlmmb                        0/1     Error     0          20h
gluster-kube1-0                        1/1     Running   2          20h
gluster-kube2-0                        1/1     Running   2          20h
gluster-kube3-0                        1/1     Running   2          20h
[vagrant@kube1 ~]$ 
  1. The other pods in GCS name space comes back in running state. But the etcd pods goes in ERROR state and unable to recover.

ksandha avatar Nov 27 '18 07:11 ksandha

I believe this is due to lack of persistence support in etcd operator. https://github.com/coreos/etcd-operator/issues/1323

JohnStrunk avatar Nov 27 '18 09:11 JohnStrunk