csi-driver-nfs
csi-driver-nfs copied to clipboard
Driver registration appears broken
What happened:
I am deploying Prometheus with the NFS driver providing persistent storage. The CSI driver is installed via Helm and all controller and node pods are up and READY. When I run my Prometheus pod it doesn't start successfully with the following error:
Warning FailedMount 52s (x702 over 23h) kubelet MountVolume.MountDevice failed for volume "pvc-d37818b3-5bb4-49ea-99c7-984579fe6871" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name nfs.csi.k8s.io not found in the list of registered CSI drivers
Here are logs for the 3 containers in the node pod for that node.
❯ kubectl -n kube-system logs csi-nfs-node-nwpqj
Defaulted container "liveness-probe" out of: liveness-probe, node-driver-registrar, nfs
I0528 03:23:59.037639 1 main.go:135] "Calling CSI driver to discover driver name"
I0528 03:23:59.039000 1 main.go:143] "CSI driver name" driver="nfs.csi.k8s.io"
I0528 03:23:59.039049 1 main.go:172] "ServeMux listening" address="localhost:29653"
k8s-infra on main [!?]
❯ kubectl -n kube-system logs csi-nfs-node-nwpqj -c node-driver-registrar
I0528 03:22:19.505312 1 main.go:150] "Version" version="v2.13.0"
I0528 03:22:19.505424 1 main.go:151] "Running node-driver-registrar" mode=""
I0528 03:22:19.505435 1 main.go:172] "Attempting to open a gRPC connection" csiAddress="/csi/csi.sock"
I0528 03:22:19.506264 1 main.go:180] "Calling CSI driver to discover driver name"
I0528 03:22:19.507908 1 main.go:189] "CSI driver name" csiDriverName="nfs.csi.k8s.io"
I0528 03:22:19.508006 1 node_register.go:56] "Starting Registration Server" socketPath="/registration/nfs.csi.k8s.io-reg.sock"
I0528 03:22:19.508201 1 node_register.go:66] "Registration Server started" socketPath="/registration/nfs.csi.k8s.io-reg.sock"
I0528 03:22:19.508413 1 node_register.go:96] "Skipping HTTP server"
I0528 03:22:20.728591 1 main.go:96] "Received GetInfo call" request="&InfoRequest{}"
I0528 03:22:21.392540 1 main.go:108] "Received NotifyRegistrationStatus call" status="&RegistrationStatus{PluginRegistered:true,Error:,}"
k8s-infra on main [!?]
❯ kubectl -n kube-system logs csi-nfs-node-nwpqj -c nfs
I0528 03:22:19.602554 1 nfs.go:90] Driver: nfs.csi.k8s.io version: v4.11.0
I0528 03:22:19.602723 1 nfs.go:147]
DRIVER INFORMATION:
-------------------
Build Date: "2025-03-18T13:07:23Z"
Compiler: gc
Driver Name: nfs.csi.k8s.io
Driver Version: v4.11.0
Git Commit: ""
Go Version: go1.23.6
Platform: linux/amd64
Streaming logs below:
I0528 03:22:19.605856 1 mount_linux.go:334] Detected umount with safe 'not mounted' behavior
I0528 03:22:19.606155 1 server.go:117] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
I0528 03:22:20.599876 1 utils.go:111] GRPC call: /csi.v1.Identity/GetPluginInfo
I0528 03:22:20.599971 1 utils.go:112] GRPC request: {}
I0528 03:22:20.602465 1 utils.go:118] GRPC response: {"name":"nfs.csi.k8s.io","vendor_version":"v4.11.0"}
I0528 03:22:20.729465 1 utils.go:111] GRPC call: /csi.v1.Node/NodeGetInfo
I0528 03:22:20.729481 1 utils.go:112] GRPC request: {}
I0528 03:22:20.729512 1 utils.go:118] GRPC response: {"node_id":"kbedge001"}
I0528 03:22:38.028051 1 utils.go:111] GRPC call: /csi.v1.Identity/GetPluginInfo
I0528 03:22:38.028308 1 utils.go:112] GRPC request: {}
I0528 03:22:38.028480 1 utils.go:118] GRPC response: {"name":"nfs.csi.k8s.io","vendor_version":"v4.11.0"}
I0528 03:23:08.018751 1 utils.go:111] GRPC call: /csi.v1.Identity/GetPluginInfo
I0528 03:23:08.018762 1 utils.go:112] GRPC request: {}
I0528 03:23:08.018816 1 utils.go:118] GRPC response: {"name":"nfs.csi.k8s.io","vendor_version":"v4.11.0"}
I0528 03:23:59.038230 1 utils.go:111] GRPC call: /csi.v1.Identity/GetPluginInfo
I0528 03:23:59.038265 1 utils.go:112] GRPC request: {}
I0528 03:23:59.038285 1 utils.go:118] GRPC response: {"name":"nfs.csi.k8s.io","vendor_version":"v4.11.0"}
While I don't know exactly what the problem is I did notice there is nothing for the NFS CSI driver in the plugins_registry directory, only other CSI drivers I have tried as part of troubleshooting for this problem.
root@kbedge001:/var/lib/kubelet/plugins_registry# ls
container-image.csi.k8s.io-reg.sock org.democratic-csi.local-hostpath-reg.sock
I confirmed this directory is represented the same in the container by looking at the container filesystem in the /proc/<pid>/root/registration directory. Note, this may not be the problem but it's something that stuck out to me. Finally, the csidriver is visible.
❯ kubectl get csidriver nfs.csi.k8s.io
NAME ATTACHREQUIRED PODINFOONMOUNT STORAGECAPACITY TOKENREQUESTS REQUIRESREPUBLISH MODES AGE
nfs.csi.k8s.io false false false <unset> false Persistent 2d8h
What you expected to happen: I expected my pod to come up and have the registration happen transparently, if it isn't already.
How to reproduce it:
- Install NFS CSI driver on k3s, Helm method probably too
- Create PVC
- Create pod bound to the new PVC
Anything else we need to know?:
values.yaml file is blank so it should be using defaults.
Environment:
- CSI Driver version: v4.11.0
- Kubernetes version (use
kubectl version): 1.31.5+k3s1 - OS (e.g. from /etc/os-release): Ubuntu 24.04.1 LTS
- Kernel (e.g.
uname -a): Linux kbedge001 6.8.0-55-generic #57-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 12 23:42:21 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux - Install tools: Helm
- Others:
is it related to kubeletDir issue? https://github.com/kubernetes-csi/csi-driver-nfs/tree/master/charts#tips
@andyzhangx I've thought of that but I am not sure since /var/lib/kubelet and /var/lib/rancher/k3s both exist and have data in them. For example, here's the file and directory listing in /var/lib/kubelet.
root@kbedge001:/var/lib/kubelet# ls -l
total 32
drwx------ 2 root root 4096 Feb 12 14:30 checkpoints
-rw------- 1 root root 62 Feb 12 14:30 cpu_manager_state
drwxr-xr-x 2 root root 4096 May 22 18:33 device-plugins
-rw------- 1 root root 61 Feb 12 14:30 memory_manager_state
drwxr-x--- 5 root root 4096 May 30 03:02 plugins
drwxr-x--- 2 root root 4096 May 30 03:02 plugins_registry
drwxr-x--- 2 root root 4096 May 22 18:33 pod-resources
drwxr-x--- 28 root root 4096 May 30 03:02 pods
and the /var/lib/rancher/k3s/agent/ directory has a lot of kubeconfigs and certificate/key pairs.
root@kbedge001:/var/lib/rancher/k3s/agent# ls
client-ca.crt client-kubelet.crt client-kube-proxy.key k3scontroller.kubeconfig pod-manifests serving-kubelet.key
client-k3s-controller.crt client-kubelet.key containerd kubelet.kubeconfig server-ca.crt
client-k3s-controller.key client-kube-proxy.crt etc kubeproxy.kubeconfig serving-kubelet.crt
Regardless, do you think it's worth my overriding the kubelet directory to the rancher/k3s one?
I reverted to an older version of the images via Helm and it may have fixed it. Here are my versions. Are you aware of known regressions which may have caused this?
csi-driver-nfs:
image:
livenessProbe:
tag: v2.15.0
nfs:
tag: v4.10.0
nodeDriverRegistrar:
tag: v2.12.0
csiProvisioner:
tag: v5.2.0
@kbreit i mainly depends on livenessProbe, nfs, nodeDriverRegistrar version, can you check which image upgrade fixed the issue?
Hello, I went in identical issue.
What's most surprising - it is happening only on 1 of 2 nodes
I'm running a small rke2 cluster (v1.32.5+rke2r1) that is containing 3 master (control plane) nodes One of nodes has applied taint of CriticalAddons since it is some old Pentium PC for storage only.
Other 2 PCs are fully operable nodes of which one is not able to run any workload that requires NFS storage due to same issue:
Warning FailedMount 61s (x6 over 11m) kubelet MountVolume.MountDevice failed for volume "xxx" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name nfs.csi.k8s.io not found in the list of registered CSI drivers
I've tried to find anything here: https://github.com/kubernetes-csi/csi-driver-nfs/tree/master/charts#tips as well as I tried to use versions provided by @kbreit. Unfortunately that didn't fix anything I see no errors in controller pod, nor in node pod. I've even tried to increase logLevel to 10 (I don't know what value would be equal to DEBUG mode).
Only difference I see between working and not working node is:
node-driver-registrar I0614 20:48:38.199258 1 main.go:96] "Received GetInfo call" request="&InfoRequest{}" node-driver-registrar I0614 20:48:38.245209 1 main.go:108] "Received NotifyRegistrationStatus call" status="&RegistrationStatus{PluginRegistered:true,Error:,}"
Those two logs appear on node pod that is running on properly working worker Except that there is no difference in initialization.
I forgot to add, that I've tested NFS connection between worker nodes by running command
showmount -e <nfs-server>
I've even successfully mounted a test nfs share to both worker nodes
Hi All, I was seeing image pull error when using the values from latest version : https://github.com/kubernetes-csi/csi-driver-nfs/blob/3dceb4a88526b16c7e3d6f7bf613758e7303c673/charts/latest/csi-driver-nfs/values.yaml
kubectl version
Client Version: v1.33.2
Kustomize Version: v5.6.0
Server Version: v1.32.3
kubectl --namespace=kube-system get pods
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-5947598c79-jj6dd 1/1 Running 7 (26h ago) 92d
calico-node-9t5zf 1/1 Running 4 (40h ago) 71d
calico-node-hq7wn 1/1 Running 2 (3d16h ago) 64d
calico-node-smfng 1/1 Running 6 (26h ago) 64d
csi-nfs-controller-55f6dc8854-v2nwx 1/5 CrashLoopBackOff 18 (3m41s ago) 12m
csi-nfs-node-46nht 3/3 Running 16 (26h ago) 58d
csi-nfs-node-b8wzv 1/3 ImagePullBackOff 6 (3m44s ago) 12m
csi-nfs-node-qhvgh 3/3 Running 12 (40h ago) 58d
when reverted to my previous values all of the csi pods came up,
image:
baseRepo: registry.k8s.io
nfs:
repository: registry.k8s.io/sig-storage/nfsplugin
tag: v4.11.0
pullPolicy: IfNotPresent
csiProvisioner:
repository: registry.k8s.io/sig-storage/csi-provisioner
tag: v5.2.0
pullPolicy: IfNotPresent
csiResizer:
repository: registry.k8s.io/sig-storage/csi-resizer
tag: v1.13.1
pullPolicy: IfNotPresent
csiSnapshotter:
repository: registry.k8s.io/sig-storage/csi-snapshotter
tag: v8.2.0
pullPolicy: IfNotPresent
livenessProbe:
repository: registry.k8s.io/sig-storage/livenessprobe
tag: v2.15.0
pullPolicy: IfNotPresent
nodeDriverRegistrar:
repository: registry.k8s.io/sig-storage/csi-node-driver-registrar
tag: v2.13.0
pullPolicy: IfNotPresent
externalSnapshotter:
repository: registry.k8s.io/sig-storage/snapshot-controller
tag: v8.2.0
pullPolicy: IfNotPresent
This may not be related to this issue, just want to share to see if someone looking for working values, Thanks
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.