gcs
gcs copied to clipboard
after the cluster is restarted the ETCD pods goes in ERROR state
cold reset of the cluster leads to ETCD pods going in ERROR state.
- create a GCS cluster
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME READY STATUS RESTARTS AGE
csi-attacher-glusterfsplugin-0 2/2 Running 0 20h
csi-nodeplugin-glusterfsplugin-dhzbh 2/2 Running 0 20h
csi-nodeplugin-glusterfsplugin-l54d4 2/2 Running 0 20h
csi-nodeplugin-glusterfsplugin-ww55c 2/2 Running 0 20h
csi-provisioner-glusterfsplugin-0 3/3 Running 0 20h
etcd-64t8sjpxvw 1/1 Running 0 20h
etcd-bg9zcfvbl2 1/1 Running 0 20h
etcd-operator-7cb5bd459b-rlwqx 1/1 Running 0 20h
etcd-q9skwdlmmb 1/1 Running 0 20h
gluster-kube1-0 1/1 Running 1 20h
gluster-kube2-0 1/1 Running 1 20h
gluster-kube3-0 1/1 Running 1 20h
[vagrant@kube1 ~]$
- Cold reset the cluster
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME READY STATUS RESTARTS AGE
csi-attacher-glusterfsplugin-0 2/2 Running 2 20h
csi-nodeplugin-glusterfsplugin-dhzbh 2/2 Running 2 20h
csi-nodeplugin-glusterfsplugin-l54d4 2/2 Running 2 20h
csi-nodeplugin-glusterfsplugin-ww55c 2/2 Running 2 20h
csi-provisioner-glusterfsplugin-0 3/3 Running 4 20h
etcd-64t8sjpxvw 0/1 Error 0 20h
etcd-bg9zcfvbl2 0/1 Error 0 20h
etcd-operator-7cb5bd459b-rlwqx 1/1 Running 1 20h
etcd-q9skwdlmmb 0/1 Error 0 20h
gluster-kube1-0 1/1 Running 2 20h
gluster-kube2-0 1/1 Running 2 20h
gluster-kube3-0 1/1 Running 2 20h
[vagrant@kube1 ~]$
- The other pods in GCS name space comes back in running state. But the etcd pods goes in ERROR state and unable to recover.
I believe this is due to lack of persistence support in etcd operator. https://github.com/coreos/etcd-operator/issues/1323