gcs
gcs copied to clipboard
kubectl crashes while creating 1000 pvc
- Create a GCS cluster.
- Start a parallel PVC creation and wait for for all the PVC
0s
pvc-2d9eaa04-0f4c-11e9-8d2d-525400e329db 500Mi RWX Delete Pending default/gcs-pvc711 glusterfs-csi 0s
pvc-2d9eaa04-0f4c-11e9-8d2d-525400e329db 500Mi RWX Delete Bound default/gcs-pvc711 glusterfs-csi 0s
pvc-2e1cdbef-0f4c-11e9-8d2d-525400e329db 500Mi RWX Delete Pending default/gcs-pvc713 glusterfs-csi 0s
pvc-2e1cdbef-0f4c-11e9-8d2d-525400e329db 500Mi RWX Delete Bound default/gcs-pvc713 glusterfs-csi 0s
pvc-2e1cdbef-0f4c-11e9-8d2d-525400e329db 500Mi RWX Delete Bound default/gcs-pvc713 glusterfs-csi 0s
pvc-2e6c2866-0f4c-11e9-8d2d-525400e329db 500Mi RWX Delete Pending default/gcs-pvc714 glusterfs-csi 0s
pvc-2e6c2866-0f4c-11e9-8d2d-525400e329db 500Mi RWX Delete Bound default/gcs-pvc714 glusterfs-csi 1s
pvc-2e6c2866-0f4c-11e9-8d2d-525400e329db 500Mi RWX Delete Bound default/gcs-pvc714 glusterfs-csi 1s
pvc-2eaf6432-0f4c-11e9-8d2d-525400e329db 500Mi RWX Delete Pending default/gcs-pvc715 glusterfs-csi 0s
pvc-2eaf6432-0f4c-11e9-8d2d-525400e329db 500Mi RWX Delete Bound default/gcs-pvc715 glusterfs-csi 0s
pvc-2eaf6432-0f4c-11e9-8d2d-525400e329db 500Mi RWX Delete Bound default/gcs-pvc715 glusterfs-csi 0s
pvc-2ef34e4a-0f4c-11e9-8d2d-525400e329db 500Mi RWX Delete Pending default/gcs-pvc716 glusterfs-csi 0s
pvc-2ef34e4a-0f4c-11e9-8d2d-525400e329db 500Mi RWX Delete Pending default/gcs-pvc716 glusterfs-csi 0s
pvc-2ef34e4a-0f4c-11e9-8d2d-525400e329db 500Mi RWX Delete Bound default/gcs-pvc716 glusterfs-csi 1s
pvc-2de0ae97-0f4c-11e9-8d2d-525400e329db 500Mi RWX Delete Pending default/gcs-pvc712 glusterfs-csi 0s
pvc-2de0ae97-0f4c-11e9-8d2d-525400e329db 500Mi RWX Delete Bound default/gcs-pvc712 glusterfs-csi 0s
pvc-2de0ae97-0f4c-11e9-8d2d-525400e329db 500Mi RWX Delete Bound default/gcs-pvc712 glusterfs-csi
- After a while kubectl crashes
[vagrant@kube1 ~]$
[vagrant@kube1 ~]$ kubectl get pv
The connection to the server localhost:8080 was refused - did you specify the right host or port?
[vagrant@kube1 ~]$ kubectl get pvc
The connection to the server localhost:8080 was refused - did you specify the right host or port?
[vagrant@kube1 ~]$
[vagrant@kube1 ~]$ kubectl get pods -n gcs
The connection to the server localhost:8080 was refused - did you specify the right host or port?
[root@gluster-kube3-0 /]# systemctl status gluster-exporter
● gluster-exporter.service - Gluster Prometheus Exporter
Loaded: loaded (/usr/lib/systemd/system/gluster-exporter.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/gluster-exporter.service.d
└─override.conf
Active: active (running) since Thu 2019-01-03 10:34:24 UTC; 40min ago
Main PID: 22 (gluster-exporte)
CGroup: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod8fe4cc3a_0f42_11e9_be36_525400d45ea4.slice/docker-71abc8de24c30e52ac2edb2d3e145d607499b042f17c90704dc1436a793aed0d.scope/system.slice/gluster-exporter.service
└─22 /usr/sbin/gluster-exporter --config=/etc/gluster-exporter/gluster-exporter....
Jan 03 10:34:24 gluster-kube3-0 systemd[1]: Started Gluster Prometheus Exporter.
[root@gluster-kube3-0 /]# systemctl stop gluster-exporter
[root@gluster-kube3-0 /]#
[root@gluster-kube3-0 /]#
[root@gluster-kube3-0 /]# systemctl status gluster-exporter
● gluster-exporter.service - Gluster Prometheus Exporter
Loaded: loaded (/usr/lib/systemd/system/gluster-exporter.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/gluster-exporter.service.d
└─override.conf
Active: inactive (dead) since Thu 2019-01-03 11:15:24 UTC; 1s ago
Process: 22 ExecStart=/usr/sbin/gluster-exporter --config=/etc/gluster-exporter/gluster-exporter.toml (code=killed, signal=TERM)
Main PID: 22 (code=killed, signal=TERM)
Jan 03 10:34:24 gluster-kube3-0 systemd[1]: Started Gluster Prometheus Exporter.
Jan 03 11:15:24 gluster-kube3-0 systemd[1]: Stopping Gluster Prometheus Exporter...
Jan 03 11:15:24 gluster-kube3-0 systemd[1]: Stopped Gluster Prometheus Exporter.
[vagrant@kube2 ~]$ kubectl get pods -n gcs
NAME READY STATUS RESTARTS AGE
anthill-7f9dbb66c8-dx72h 1/1 Running 0 5h6m
csi-attacher-glusterfsplugin-0 2/2 Running 0 5h
csi-nodeplugin-glusterfsplugin-6msqc 2/2 NodeLost 0 5h
csi-nodeplugin-glusterfsplugin-7xbp5 2/2 Running 0 5h
csi-nodeplugin-glusterfsplugin-hz22z 2/2 Running 0 5h
csi-provisioner-glusterfsplugin-0 3/3 Unknown 0 5h
etcd-gjzts8xdvx 0/1 Running 0 5h6m
etcd-hv429kv7tp 0/1 Unknown 0 5h5m
etcd-operator-7cb5bd459b-d2c79 1/1 Running 5 5h7m
etcd-sslrc7n9bh 0/1 Completed 0 5h5m
gluster-kube1-0 1/1 Unknown 1 5h4m
gluster-kube2-0 1/1 Running 1 5h4m
gluster-kube3-0 1/1 Running 1 5h4m
[vagrant@kube2 ~]$
Observations:-
- Seeing the etcd pods going into completed state.
- Total bound number of PVC are 716 ; total number of PVC issued are 1001
- PVC are still not deleting gracefully.
I have seen this in Vagrant as well. The problem I had was that I overloaded the nodes (VMs) and various processes were getting oom-killed. Is that a possibility here? (from "NodeLost" and "unknown" pods states, it looks like it may be)