csi-driver-iscsi
csi-driver-iscsi copied to clipboard
iscsi csi driver fails to mount LUN in the right location of a replaced pod
What happened:
We're using a Postgresql Bitnami Helm Chart (15.1.4) to run a postgres on a microk8s v1.29 cluster. I wanted to leverage this csi driver for this db storage using an iscsi LUN and target that I created on a QNAP NAS connected over a 10GbE network.
To connect to the LUN, I created a PV + PVC like in the examples and added the PVC as primary.persistence.existingClaim
value when deploying the helm chart.
This was working like a charm, at last we could move away from risky storage in the node or slower NFS. However, I replaced the pods of the chart's statefulset to increase its resources, and somehow the csi-iscsi-node
didn't mount the target in the right location of the pod's volume.
The outcome (and how we realised): the new location of the volume /var/snap/microk8s/common/var/lib/kubelet/pods/b88fdaea-a22e-42ac-90ae-d71f927dc300/volumes/kubernetes.io~csi/postgresql/mount
wasn't actually a mount of the storage in the NAS, but the node root's filesystem itself! A parallel data ingestion operation consumed the node's storage degrading the node and somewhat the whole cluster as many key workloads got evicted with [DiskPressure]
and a taint added to the node
Logs that we encountered:
I0530 23:43:45.773799 1 utils.go:64] GRPC request: {"target_path":"/var/snap/microk8s/common/var/lib/kubelet/pods/b88fdaea-a22e-42ac-90ae-d71f927dc300/volumes/kubernetes.io~csi/postgresql/mount","volume_id":"iscsi-postgresql-id"}
I0530 23:43:45.773861 1 mount_linux.go:164] Detected OS without systemd
W0530 23:43:45.777225 1 iscsi_util.go:95] warning: Unmount skipped because path does not exist: /var/snap/microk8s/common/var/lib/kubelet/pods/b88fdaea-a22e-42ac-90ae-d71f927dc300/volumes/kubernetes.io~csi/postgresql/mount
The Detected OS without systemd
message is equally puzzling as we're using Ubuntu 22.04 :thinking: ...
What you expected to happen:
Say original pod volume location was:
/var/snap/microk8s/common/var/lib/kubelet/pods/9cd76fee-cd41-4869-90d2-d46ffedddf68/volumes/kubernetes.io~csi/postgresql/mount
-> This was actually the mount point of the filesystem used by the iscsi target.
And the new pod volume location was
/var/snap/microk8s/common/var/lib/kubelet/pods/b88fdaea-a22e-42ac-90ae-d71f927dc300/volumes/kubernetes.io~csi/postgresql/mount
I would expect the iscsi csi driver node to unmount the target in the first location and re-mount it in the second location, corresponding to the replacement pod, with no data loss.
How to reproduce it:
- Create iscsi target + lun
- Create pv + pvc like in the driver's examples. e.g. the
PersistentVolume
manifest:
kind: PersistentVolume
metadata:
name: postgresql-pv
labels:
name: postgresql
spec:
storageClassName: postgresql-sc
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
csi:
driver: iscsi.csi.k8s.io
volumeHandle: iscsi-postgresql-id
volumeAttributes:
targetPortal: "X.X.X.X"
portals: "[]"
iqn: "iqn.<redacted>:iscsi.csi.8136ad"
lun: "1"
iscsiInterface: "default"
discoveryCHAPAuth: "true"
sessionCHAPAuth: "false"
- customise and deploy the helm chart for bitnami postgres and select the existing claim created in 2 in the values.yaml
- scale the statefulset to 0 and then to 1, or kill the pod which will instruct the statefulset controller to request a new pod to the kube API.
Anything else we need to know?:
- I've since removed the postgres chart, but I can still see
warning: Unmount skipped because path does not exist
messages in the node logs. Environment: - CSI Driver version: commit hash:
554efb1
- Kubernetes version (use
kubectl version
):v1.29.4
- OS (e.g. from /etc/os-release):
Ubuntu 22.04.3 LTS
- Kernel (e.g.
uname -a
):5.15.0-105-generic
- Install tools:
open-iscsi
- Others:
microk8s v1.29