gcs icon indicating copy to clipboard operation
gcs copied to clipboard

csi-node going to Evicted state

Open PrasadDesala opened this issue 6 years ago • 3 comments

One of the csi-node is keep on going to Evicted state. Below is the describe output after the pod is recreated, [vagrant@kube1 ~]$ kubectl -n gcs describe pods csi-nodeplugin-glusterfsplugin-zc9qm Name: csi-nodeplugin-glusterfsplugin-zc9qm Namespace: gcs Priority: 0 PriorityClassName: Node: kube3/ Start Time: Mon, 21 Jan 2019 07:44:47 +0000 Labels: app.kubernetes.io/component=csi-driver app.kubernetes.io/name=csi-nodeplugin app.kubernetes.io/part-of=gcs controller-revision-hash=89ffb78f5 pod-template-generation=1 Annotations: Status: Failed Reason: Evicted Message: The node was low on resource: ephemeral-storage. Container gluster-nodeplugin was using 32Ki, which exceeds its request of 0. Container csi-node-driver-registrar was using 32Ki, which exceeds its request of 0. IP:
Controlled By: DaemonSet/csi-nodeplugin-glusterfsplugin Containers: csi-node-driver-registrar: Image: quay.io/k8scsi/csi-node-driver-registrar:v1.0.1 Port: Host Port: Args: --v=5 --csi-address=$(ADDRESS) --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH) Environment: ADDRESS: /plugin/csi.sock DRIVER_REG_SOCK_PATH: /var/lib/kubelet/plugins_registry/org.gluster.glusterfs/csi.sock KUBE_NODE_NAME: (v1:spec.nodeName) Mounts: /plugin from plugin-dir (rw) /registration from registration-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from csi-nodeplugin-token-k6k85 (ro) gluster-nodeplugin: Image: docker.io/gluster/gluster-csi-driver:latest Port: Host Port: Args: --nodeid=$(NODE_ID) --v=5 --endpoint=$(CSI_ENDPOINT) --resturl=$(REST_URL) --resttimeout=120 Environment: NODE_ID: (v1:spec.nodeName) CSI_ENDPOINT: unix://plugin/csi.sock REST_URL: http://glusterd2-client.gcs:24007 Mounts: /plugin from plugin-dir (rw) /var/lib/kubelet/pods from pods-mount-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from csi-nodeplugin-token-k6k85 (ro) Volumes: plugin-dir: Type: HostPath (bare host directory volume) Path: /var/lib/kubelet/plugins_registry/org.gluster.glusterfs HostPathType: DirectoryOrCreate pods-mount-dir: Type: HostPath (bare host directory volume) Path: /var/lib/kubelet/pods HostPathType: Directory registration-dir: Type: HostPath (bare host directory volume) Path: /var/lib/kubelet/plugins_registry/ HostPathType: Directory csi-nodeplugin-token-k6k85: Type: Secret (a volume populated by a Secret) SecretName: csi-nodeplugin-token-k6k85 Optional: false QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/disk-pressure:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/not-ready:NoExecute node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unschedulable:NoSchedule Events: Type Reason Age From Message


Normal Scheduled 14m default-scheduler Successfully assigned gcs/csi-nodeplugin-glusterfsplugin-zc9qm to kube3 Normal Pulling 14m kubelet, kube3 pulling image "quay.io/k8scsi/csi-node-driver-registrar:v1.0.1" Normal Pulled 14m kubelet, kube3 Successfully pulled image "quay.io/k8scsi/csi-node-driver-registrar:v1.0.1" Normal Created 14m kubelet, kube3 Created container Normal Started 14m kubelet, kube3 Started container Normal Pulling 14m kubelet, kube3 pulling image "docker.io/gluster/gluster-csi-driver:latest" Normal Pulled 13m kubelet, kube3 Successfully pulled image "docker.io/gluster/gluster-csi-driver:latest" Normal Created 13m kubelet, kube3 Created container Normal Started 13m kubelet, kube3 Started container Warning Evicted 7m41s kubelet, kube3 The node was low on resource: ephemeral-storage. Container gluster-nodeplugin was using 32Ki, which exceeds its request of 0. Container csi-node-driver-registrar was using 32Ki, which exceeds its request of 0. Normal Killing 7m41s kubelet, kube3 Killing container with id docker://gluster-nodeplugin:Need to kill Pod Normal Killing 7m41s kubelet, kube3 Killing container with id docker://csi-node-driver-registrar:Need to kill Pod

The message says that the node has low resources but I see some space left on the root of kube's and some memory.

[vagrant@kube1 ~]$ free -h total used free shared buff/cache available Mem: 31G 25G 707M 27M 4.9G 4.3G Swap: 0B 0B 0B [vagrant@kube1 ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/atomicos-root 27G 20G 7.6G 72% /

[vagrant@kube2 ~]$ free -h total used free shared buff/cache available Mem: 31G 11G 3.4G 26M 16G 18G Swap: 0B 0B 0B [vagrant@kube2 ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/atomicos-root 27G 22G 5.2G 81% /

[vagrant@kube3 ~]$ free -h total used free shared buff/cache available Mem: 31G 1.3G 21G 23M 8.6G 28G Swap: 0B 0B 0B [vagrant@kube3 ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/atomicos-root 27G 23G 4.2G 85% /

PrasadDesala avatar Jan 21 '19 08:01 PrasadDesala

Looks like a couple things are going on here...

  • We should probably put a (low) storage reservation on the CSI components to avoid the eviction
  • We need to figure out what's eating the storage on the node. My hunch is GD2 or fuse logs.

JohnStrunk avatar Jan 22 '19 18:01 JohnStrunk

I am also seeing csi-node evictions due to ephemeral storage, and I suspect it is causing pod evictions of other pods on the same node that are using the same shared storage.

bulldozier avatar May 16 '19 19:05 bulldozier

If I understand correctly, this happens because something is trying to write to /var/log/ I'm not using gluster atm, but happened to come across this issue while searching for solutions for my very similiar problem.

So request some epehemeral storage, or get rid of whatever is trying to log to /var/log/.

frederik-b avatar May 21 '19 10:05 frederik-b