gcs icon indicating copy to clipboard operation
gcs copied to clipboard

kubectl crashes while creating 1000 pvc

Open ksandha opened this issue 6 years ago • 6 comments

  1. Create a GCS cluster.
  2. Start a parallel PVC creation and wait for for all the PVC
  0s
pvc-2d9eaa04-0f4c-11e9-8d2d-525400e329db   500Mi   RWX   Delete   Pending   default/gcs-pvc711   glusterfs-csi         0s
pvc-2d9eaa04-0f4c-11e9-8d2d-525400e329db   500Mi   RWX   Delete   Bound   default/gcs-pvc711   glusterfs-csi         0s
pvc-2e1cdbef-0f4c-11e9-8d2d-525400e329db   500Mi   RWX   Delete   Pending   default/gcs-pvc713   glusterfs-csi         0s
pvc-2e1cdbef-0f4c-11e9-8d2d-525400e329db   500Mi   RWX   Delete   Bound   default/gcs-pvc713   glusterfs-csi         0s
pvc-2e1cdbef-0f4c-11e9-8d2d-525400e329db   500Mi   RWX   Delete   Bound   default/gcs-pvc713   glusterfs-csi         0s
pvc-2e6c2866-0f4c-11e9-8d2d-525400e329db   500Mi   RWX   Delete   Pending   default/gcs-pvc714   glusterfs-csi         0s
pvc-2e6c2866-0f4c-11e9-8d2d-525400e329db   500Mi   RWX   Delete   Bound   default/gcs-pvc714   glusterfs-csi         1s
pvc-2e6c2866-0f4c-11e9-8d2d-525400e329db   500Mi   RWX   Delete   Bound   default/gcs-pvc714   glusterfs-csi         1s
pvc-2eaf6432-0f4c-11e9-8d2d-525400e329db   500Mi   RWX   Delete   Pending   default/gcs-pvc715   glusterfs-csi         0s
pvc-2eaf6432-0f4c-11e9-8d2d-525400e329db   500Mi   RWX   Delete   Bound   default/gcs-pvc715   glusterfs-csi         0s
pvc-2eaf6432-0f4c-11e9-8d2d-525400e329db   500Mi   RWX   Delete   Bound   default/gcs-pvc715   glusterfs-csi         0s
pvc-2ef34e4a-0f4c-11e9-8d2d-525400e329db   500Mi   RWX   Delete   Pending   default/gcs-pvc716   glusterfs-csi         0s
pvc-2ef34e4a-0f4c-11e9-8d2d-525400e329db   500Mi   RWX   Delete   Pending   default/gcs-pvc716   glusterfs-csi         0s
pvc-2ef34e4a-0f4c-11e9-8d2d-525400e329db   500Mi   RWX   Delete   Bound   default/gcs-pvc716   glusterfs-csi         1s
pvc-2de0ae97-0f4c-11e9-8d2d-525400e329db   500Mi   RWX   Delete   Pending   default/gcs-pvc712   glusterfs-csi         0s
pvc-2de0ae97-0f4c-11e9-8d2d-525400e329db   500Mi   RWX   Delete   Bound   default/gcs-pvc712   glusterfs-csi         0s
pvc-2de0ae97-0f4c-11e9-8d2d-525400e329db   500Mi   RWX   Delete   Bound   default/gcs-pvc712   glusterfs-csi  
  1. After a while kubectl crashes
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ kubectl get pv 
The connection to the server localhost:8080 was refused - did you specify the right host or port?
[vagrant@kube1 ~]$ kubectl get pvc
The connection to the server localhost:8080 was refused - did you specify the right host or port?
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ kubectl get pods -n gcs
The connection to the server localhost:8080 was refused - did you specify the right host or port?

ksandha avatar Jan 03 '19 14:01 ksandha

[root@gluster-kube3-0 /]# systemctl status gluster-exporter  
● gluster-exporter.service - Gluster Prometheus Exporter
   Loaded: loaded (/usr/lib/systemd/system/gluster-exporter.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/gluster-exporter.service.d
           └─override.conf
   Active: active (running) since Thu 2019-01-03 10:34:24 UTC; 40min ago
 Main PID: 22 (gluster-exporte)
   CGroup: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod8fe4cc3a_0f42_11e9_be36_525400d45ea4.slice/docker-71abc8de24c30e52ac2edb2d3e145d607499b042f17c90704dc1436a793aed0d.scope/system.slice/gluster-exporter.service
           └─22 /usr/sbin/gluster-exporter --config=/etc/gluster-exporter/gluster-exporter....

Jan 03 10:34:24 gluster-kube3-0 systemd[1]: Started Gluster Prometheus Exporter.
[root@gluster-kube3-0 /]# systemctl stop gluster-exporter
[root@gluster-kube3-0 /]# 
[root@gluster-kube3-0 /]# 
[root@gluster-kube3-0 /]# systemctl status gluster-exporter
● gluster-exporter.service - Gluster Prometheus Exporter
   Loaded: loaded (/usr/lib/systemd/system/gluster-exporter.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/gluster-exporter.service.d
           └─override.conf
   Active: inactive (dead) since Thu 2019-01-03 11:15:24 UTC; 1s ago
  Process: 22 ExecStart=/usr/sbin/gluster-exporter --config=/etc/gluster-exporter/gluster-exporter.toml (code=killed, signal=TERM)
 Main PID: 22 (code=killed, signal=TERM)

Jan 03 10:34:24 gluster-kube3-0 systemd[1]: Started Gluster Prometheus Exporter.
Jan 03 11:15:24 gluster-kube3-0 systemd[1]: Stopping Gluster Prometheus Exporter...
Jan 03 11:15:24 gluster-kube3-0 systemd[1]: Stopped Gluster Prometheus Exporter.

ksandha avatar Jan 03 '19 15:01 ksandha

[vagrant@kube2 ~]$ kubectl get pods -n gcs
NAME                                   READY   STATUS      RESTARTS   AGE
anthill-7f9dbb66c8-dx72h               1/1     Running     0          5h6m
csi-attacher-glusterfsplugin-0         2/2     Running     0          5h
csi-nodeplugin-glusterfsplugin-6msqc   2/2     NodeLost    0          5h
csi-nodeplugin-glusterfsplugin-7xbp5   2/2     Running     0          5h
csi-nodeplugin-glusterfsplugin-hz22z   2/2     Running     0          5h
csi-provisioner-glusterfsplugin-0      3/3     Unknown     0          5h
etcd-gjzts8xdvx                        0/1     Running     0          5h6m
etcd-hv429kv7tp                        0/1     Unknown     0          5h5m
etcd-operator-7cb5bd459b-d2c79         1/1     Running     5          5h7m
etcd-sslrc7n9bh                        0/1     Completed   0          5h5m
gluster-kube1-0                        1/1     Unknown     1          5h4m
gluster-kube2-0                        1/1     Running     1          5h4m
gluster-kube3-0                        1/1     Running     1          5h4m
[vagrant@kube2 ~]$ 

ksandha avatar Jan 03 '19 15:01 ksandha

Observations:-

  1. Seeing the etcd pods going into completed state.
  2. Total bound number of PVC are 716 ; total number of PVC issued are 1001
  3. PVC are still not deleting gracefully.

ksandha avatar Jan 03 '19 17:01 ksandha

csiattcher.log

ksandha avatar Jan 03 '19 18:01 ksandha

I have seen this in Vagrant as well. The problem I had was that I overloaded the nodes (VMs) and various processes were getting oom-killed. Is that a possibility here? (from "NodeLost" and "unknown" pods states, it looks like it may be)

JohnStrunk avatar Jan 04 '19 18:01 JohnStrunk