gcs
gcs copied to clipboard
csi-node going to Evicted state
One of the csi-node is keep on going to Evicted state.
Below is the describe output after the pod is recreated,
[vagrant@kube1 ~]$ kubectl -n gcs describe pods csi-nodeplugin-glusterfsplugin-zc9qm
Name: csi-nodeplugin-glusterfsplugin-zc9qm
Namespace: gcs
Priority: 0
PriorityClassName:
Controlled By: DaemonSet/csi-nodeplugin-glusterfsplugin
Containers:
csi-node-driver-registrar:
Image: quay.io/k8scsi/csi-node-driver-registrar:v1.0.1
Port:
Normal Scheduled 14m default-scheduler Successfully assigned gcs/csi-nodeplugin-glusterfsplugin-zc9qm to kube3 Normal Pulling 14m kubelet, kube3 pulling image "quay.io/k8scsi/csi-node-driver-registrar:v1.0.1" Normal Pulled 14m kubelet, kube3 Successfully pulled image "quay.io/k8scsi/csi-node-driver-registrar:v1.0.1" Normal Created 14m kubelet, kube3 Created container Normal Started 14m kubelet, kube3 Started container Normal Pulling 14m kubelet, kube3 pulling image "docker.io/gluster/gluster-csi-driver:latest" Normal Pulled 13m kubelet, kube3 Successfully pulled image "docker.io/gluster/gluster-csi-driver:latest" Normal Created 13m kubelet, kube3 Created container Normal Started 13m kubelet, kube3 Started container Warning Evicted 7m41s kubelet, kube3 The node was low on resource: ephemeral-storage. Container gluster-nodeplugin was using 32Ki, which exceeds its request of 0. Container csi-node-driver-registrar was using 32Ki, which exceeds its request of 0. Normal Killing 7m41s kubelet, kube3 Killing container with id docker://gluster-nodeplugin:Need to kill Pod Normal Killing 7m41s kubelet, kube3 Killing container with id docker://csi-node-driver-registrar:Need to kill Pod
The message says that the node has low resources but I see some space left on the root of kube's and some memory.
[vagrant@kube1 ~]$ free -h total used free shared buff/cache available Mem: 31G 25G 707M 27M 4.9G 4.3G Swap: 0B 0B 0B [vagrant@kube1 ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/atomicos-root 27G 20G 7.6G 72% /
[vagrant@kube2 ~]$ free -h total used free shared buff/cache available Mem: 31G 11G 3.4G 26M 16G 18G Swap: 0B 0B 0B [vagrant@kube2 ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/atomicos-root 27G 22G 5.2G 81% /
[vagrant@kube3 ~]$ free -h total used free shared buff/cache available Mem: 31G 1.3G 21G 23M 8.6G 28G Swap: 0B 0B 0B [vagrant@kube3 ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/atomicos-root 27G 23G 4.2G 85% /
Looks like a couple things are going on here...
- We should probably put a (low) storage reservation on the CSI components to avoid the eviction
- We need to figure out what's eating the storage on the node. My hunch is GD2 or fuse logs.
I am also seeing csi-node evictions due to ephemeral storage, and I suspect it is causing pod evictions of other pods on the same node that are using the same shared storage.
If I understand correctly, this happens because something is trying to write to /var/log/ I'm not using gluster atm, but happened to come across this issue while searching for solutions for my very similiar problem.
So request some epehemeral storage, or get rid of whatever is trying to log to /var/log/.