gcs icon indicating copy to clipboard operation
gcs copied to clipboard

GD2 pod automatically reboots.

Open ksandha opened this issue 6 years ago • 3 comments

Steps performed:-

  1. Created the GCS cluster:-
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME                                   READY   STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-45snb   2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-pgp2w   2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-s8g76   2/2     Running   0          2d21h
csi-provisioner-glusterfsplugin-0      2/2     Running   0          2d21h
etcd-4m7wv5fqk2                        1/1     Running   0          2d21h
etcd-6mf2nsl2p4                        1/1     Running   0          2d21h
etcd-lbmh9xjxm8                        1/1     Running   0          2d21h
etcd-operator-7cb5bd459b-tddxt         1/1     Running   0          2d21h
gluster-kube1-0                        1/1     Running   1          2d21h
gluster-kube2-0                        1/1     Running   0          2d21h
gluster-kube3-0                        1/1     Running   0          2d21h
  1. deleted the gd2 pod
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ kubectl delete pods -n gcs gluster-kube1-0 
pod "gluster-kube1-0" deleted

[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME                                   READY   STATUS              RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running             0          2d21h
csi-nodeplugin-glusterfsplugin-45snb   2/2     Running             0          2d21h
csi-nodeplugin-glusterfsplugin-pgp2w   2/2     Running             0          2d21h
csi-nodeplugin-glusterfsplugin-s8g76   2/2     Running             0          2d21h
csi-provisioner-glusterfsplugin-0      2/2     Running             0          2d21h
etcd-4m7wv5fqk2                        1/1     Running             0          2d21h
etcd-6mf2nsl2p4                        1/1     Running             0          2d21h
etcd-lbmh9xjxm8                        1/1     Running             0          2d21h
etcd-operator-7cb5bd459b-tddxt         1/1     Running             0          2d21h
gluster-kube1-0                        0/1     ContainerCreating   0          5s
gluster-kube2-0                        1/1     Running             0          2d21h
gluster-kube3-0                        1/1     Running             0          2d21h
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ kubectl get pods -n gcs 
NAME                                   READY   STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-45snb   2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-pgp2w   2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-s8g76   2/2     Running   0          2d21h
csi-provisioner-glusterfsplugin-0      2/2     Running   0          2d21h
etcd-4m7wv5fqk2                        1/1     Running   0          2d21h
etcd-6mf2nsl2p4                        1/1     Running   0          2d21h
etcd-lbmh9xjxm8                        1/1     Running   0          2d21h
etcd-operator-7cb5bd459b-tddxt         1/1     Running   0          2d21h
gluster-kube1-0                        1/1     Running   0          43s
gluster-kube2-0                        1/1     Running   0          2d21h
gluster-kube3-0                        1/1     Running   0          2d21h
[vagrant@kube1 ~]$ 
  1. executed commands on gd2 pod by log in by using same end points of kube1:-
command terminated with exit code 1
[vagrant@kube1 ~]$ kubectl get pods -n gcs -w
NAME                                   READY   STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-45snb   2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-pgp2w   2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-s8g76   2/2     Running   0          2d21h
csi-provisioner-glusterfsplugin-0      2/2     Running   0          2d21h
etcd-4m7wv5fqk2                        1/1     Running   0          2d21h
etcd-6mf2nsl2p4                        1/1     Running   0          2d21h
etcd-lbmh9xjxm8                        1/1     Running   0          2d21h
etcd-operator-7cb5bd459b-tddxt         1/1     Running   0          2d21h
gluster-kube1-0                        1/1     Running   0          2m52s
gluster-kube2-0                        1/1     Running   0          2d21h
gluster-kube3-0                        1/1     Running   0          2d21h
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$  kubectl -n gcs -it exec gluster-kube1-0 -- /bin/bash
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# glustercli peer list --endpoints="http://gluster-kube1-0.glusterd2.gcs:24007"
Failed to get Peers list

Failed to connect to glusterd. Please check if
- Glusterd is running(http://gluster-kube1-0.glusterd2.gcs:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# glustercli volume list --endpoints="http://gluster-kube1-0.glusterd2.gcs:24007"
Error getting volumes list

Failed to connect to glusterd. Please check if
- Glusterd is running(http://gluster-kube1-0.glusterd2.gcs:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# 

  1. now executed commands using kube2 and kube 3 as endpoints. logged in from same kube1 pod.
[root@gluster-kube1-0 /]# glustercli volume list --endpoints="http://gluster-kube2-0.glusterd2.gcs:24007"
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
|                  ID                  |         NAME         |   TYPE    |  STATE  | TRANSPORT | BRICKS |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
| 6cd58524-5172-4d9e-89ae-414bc338eba6 | pvc-f603ac47dcdc11e8 | Replicate | Started | tcp       | 3      |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# glustercli volume list --endpoints="http://gluster-kube3-0.glusterd2.gcs:24007"
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
|                  ID                  |         NAME         |   TYPE    |  STATE  | TRANSPORT | BRICKS |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
| 6cd58524-5172-4d9e-89ae-414bc338eba6 | pvc-f603ac47dcdc11e8 | Replicate | Started | tcp       | 3      |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
[root@gluster-kube1-0 /]# 

  1. while excuting more commands the pods automatically restarts:-
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# glustercli volume list -command terminated with exit code 137usterd2.gcs:24007"
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ kubectl get pods -n gcs -w
NAME                                   READY   STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-45snb   2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-pgp2w   2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-s8g76   2/2     Running   0          2d21h
csi-provisioner-glusterfsplugin-0      2/2     Running   0          2d21h
etcd-4m7wv5fqk2                        1/1     Running   0          2d21h
etcd-6mf2nsl2p4                        1/1     Running   0          2d21h
etcd-lbmh9xjxm8                        1/1     Running   0          2d21h
etcd-operator-7cb5bd459b-tddxt         1/1     Running   0          2d21h
gluster-kube1-0                        1/1     Running   1          4m
gluster-kube2-0                        1/1     Running   0          2d21h
gluster-kube3-0                        1/1     Running   0          2d21h
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ kubectl describe pods -n gcs gluster-kube1-0
Name:               gluster-kube1-0
Namespace:          gcs
Priority:           0
PriorityClassName:  <none>
Node:               kube1/192.168.121.7
Start Time:         Fri, 02 Nov 2018 05:39:18 +0000
Labels:             app.kubernetes.io/component=glusterfs
                    app.kubernetes.io/name=glusterd2
                    app.kubernetes.io/part-of=gcs
                    controller-revision-hash=gluster-kube1-55bc79f94
                    statefulset.kubernetes.io/pod-name=gluster-kube1-0
Annotations:        <none>
Status:             Running
IP:                 10.233.64.7
Controlled By:      StatefulSet/gluster-kube1
Containers:
  glusterd2:
    Container ID:   docker://a261c3bcb84f993948b0691e199396109985d1bd9d547250476168cfd01a9520
    Image:          docker.io/gluster/glusterd2-nightly
    Image ID:       docker-pullable://docker.io/gluster/glusterd2-nightly@sha256:06e42f3354bff80a724007dbc5442349c3a53d31eceb935fd6b3776d6cdcb0fa
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Fri, 02 Nov 2018 05:43:08 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Fri, 02 Nov 2018 05:39:48 +0000
      Finished:     Fri, 02 Nov 2018 05:43:04 +0000
    Ready:          True
    Restart Count:  1
    Liveness:       http-get http://:24007/ping delay=10s timeout=1s period=60s #success=1 #failure=3
    Environment:
      GD2_ETCDENDPOINTS:  http://etcd-client.gcs:2379
      GD2_CLUSTER_ID:     27056e19-500a-4e7a-b5a9-71f461679196
      GD2_CLIENTADDRESS:  gluster-kube1-0.glusterd2.gcs:24007
      GD2_ENDPOINTS:      http://gluster-kube1-0.glusterd2.gcs:24007
      GD2_PEERADDRESS:    gluster-kube1-0.glusterd2.gcs:24008
      GD2_RESTAUTH:       false
    Mounts:
      /dev from gluster-dev (rw)
      /run/lvm from gluster-lvm (rw)
      /sys/fs/cgroup from gluster-cgroup (ro)
      /usr/lib/modules from gluster-kmods (ro)
      /var/lib/glusterd2 from glusterd2-statedir (rw)
      /var/log/glusterd2 from glusterd2-logdir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8s2lg (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  gluster-dev:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:  
  gluster-cgroup:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/cgroup
    HostPathType:  
  gluster-lvm:
    Type:          HostPath (bare host directory volume)
    Path:          /run/lvm
    HostPathType:  
  gluster-kmods:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/lib/modules
    HostPathType:  
  glusterd2-statedir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/glusterd2
    HostPathType:  DirectoryOrCreate
  glusterd2-logdir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/glusterd2
    HostPathType:  DirectoryOrCreate
  default-token-8s2lg:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-8s2lg
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  30m                default-scheduler  Successfully assigned gcs/gluster-kube1-0 to kube1
  Warning  Unhealthy  27m (x3 over 29m)  kubelet, kube1     Liveness probe failed: Get http://10.233.64.7:24007/ping: dial tcp 10.233.64.7:24007: connect: connection refused
  Normal   Pulling    26m (x2 over 30m)  kubelet, kube1     pulling image "docker.io/gluster/glusterd2-nightly"
  Normal   Killing    26m                kubelet, kube1     Killing container with id docker://glusterd2:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Pulled     26m (x2 over 30m)  kubelet, kube1     Successfully pulled image "docker.io/gluster/glusterd2-nightly"
  Normal   Created    26m (x2 over 30m)  kubelet, kube1     Created container
  Normal   Started    26m (x2 over 30m)  kubelet, kube1     Started container
[vagrant@kube1 ~]$ 

ksandha avatar Nov 02 '18 06:11 ksandha

After this restart, were you able to use gd2 on gluster-kube1-0? The failed health check caused the restart of the pod, and that was for the same reason you couldn't use glustercli. We're still left with the question of why...

Do you have any logs from the gd2 container before it was killed for being unhealthy?

JohnStrunk avatar Nov 05 '18 16:11 JohnStrunk

@JohnStrunk , I tried to cpature the logs but within that time the container was killed. I can retry the scenario and see if i am able to hit the same thing again and will try to capture the logs in different terminal.

ksandha avatar Nov 05 '18 16:11 ksandha

@ksandha can you provide below info

  • Logs from the restarted container /var/log/glusterd2/glusterd2
  • kubectl get events

Normal Killing 26m kubelet, kube1 Killing container with id docker://glusterd2:Container failed liveness probe.. Container will be killed and recreated.

my suspicion is this is due to the issue #68

Madhu-1 avatar Dec 05 '18 06:12 Madhu-1