csi-driver-smb
csi-driver-smb copied to clipboard
Mount point deletion issues causing CreateContainerError on pod
What happened:
Mount issues causing CreateContainerError on pod:
Failed to generate container 'xxyy' spec: failed to generate spec: failed to mkdir "/var/lib/kubelet/pods/ea38ea2f-03af-459e-8038-24b46e4aadc1/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount": mkdir "/var/lib/kubelet/pods/ea38ea2f-03af-459e-8038-24b46e4aadc1/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount" file exists
Looking at the driver logs it shows that there is a mount already existing and tries to delete it but fails... the cycle keeps repeating.
I0528 19:11:34.336620 1 nodeserver.go:85] NodePublishVolume: mount /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/globalmount at /var/lib/kubelet/pods/34a51e04-81fb-412e-85a8-9a343cedc6bb/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount volumeID(corp.example.com/dfs#pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554) successfully
I0528 19:11:34.336644 1 utils.go:83] GRPC response: {}
I0528 19:11:40.734494 1 utils.go:76] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0528 19:11:40.734527 1 utils.go:77] GRPC request: {"target_path":"/var/lib/kubelet/pods/97094c31-59ad-4690-9359-53d9dea547d4/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount","volume_id":"corp.example.com/dfs#pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554"}
I0528 19:11:40.734617 1 nodeserver.go:100] NodeUnpublishVolume: unmounting volume corp.example.com/dfs#pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554 on /var/lib/kubelet/pods/97094c31-59ad-4690-9359-53d9dea547d4/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount
I0528 19:11:40.734638 1 mount_linux.go:294] Unmounting /var/lib/kubelet/pods/97094c31-59ad-4690-9359-53d9dea547d4/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount
W0528 19:11:40.804126 1 mount_helper_common.go:133] Warning: "/var/lib/kubelet/pods/97094c31-59ad-4690-9359-53d9dea547d4/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount" is not a mountpoint, deleting
I0528 19:11:40.804244 1 nodeserver.go:105] NodeUnpublishVolume: unmount volume corp.example.com/dfs#pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554 on /var/lib/kubelet/pods/97094c31-59ad-4690-9359-53d9dea547d4/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount successfully
I0528 19:11:40.804262 1 utils.go:83] GRPC response: {}
I0528 20:08:14.208478 1 utils.go:76] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0528 20:08:14.208506 1 utils.go:77] GRPC request: {"target_path":"/var/lib/kubelet/pods/34a51e04-81fb-412e-85a8-9a343cedc6bb/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount","volume_id":"corp.example.com/dfs#pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554"}
I0528 20:08:14.208591 1 nodeserver.go:100] NodeUnpublishVolume: unmounting volume corp.example.com/dfs#pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554 on /var/lib/kubelet/pods/34a51e04-81fb-412e-85a8-9a343cedc6bb/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount
I0528 20:08:14.208612 1 mount_linux.go:294] Unmounting /var/lib/kubelet/pods/34a51e04-81fb-412e-85a8-9a343cedc6bb/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount
W0528 20:08:14.265403 1 mount_helper_common.go:133] Warning: "/var/lib/kubelet/pods/34a51e04-81fb-412e-85a8-9a343cedc6bb/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount" is not a mountpoint, deleting
I0528 20:08:14.266071 1 nodeserver.go:105] NodeUnpublishVolume: unmount volume corp.example.com/dfs#pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554 on /var/lib/kubelet/pods/34a51e04-81fb-412e-85a8-9a343cedc6bb/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount successfully
I0528 20:08:14.266094 1 utils.go:83] GRPC response: {}
I0528 20:08:20.908469 1 utils.go:76] GRPC call: /csi.v1.Node/NodePublishVolume
I0528 20:08:20.908495 1 utils.go:77] GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/globalmount","target_path":"/var/lib/kubelet/pods/ea38ea2f-03af-459e-8038-24b46e4aadc1/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["vers=2.1","file_mode=0775","dir_mode=0775","uid=1000","gid=1000","domain=corp"]}},"access_mode":{"mode":5}},"volume_context":{"csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"demo-cc98b5bf8-x86xb","csi.storage.k8s.io/pod.namespace":"team-demo-prod","csi.storage.k8s.io/pod.uid":"ea38ea2f-03af-459e-8038-24b46e4aadc1","csi.storage.k8s.io/serviceAccount.name":"default","source":"//corp.example.com/dfs","storage.kubernetes.io/csiProvisionerIdentity":"1646688438914-8081-smb.csi.k8s.io"},"volume_id":"corp.example.com/dfs#pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554"}
I0528 20:08:20.939141 1 nodeserver.go:78] NodePublishVolume: mounting /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/globalmount at /var/lib/kubelet/pods/ea38ea2f-03af-459e-8038-24b46e4aadc1/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount with mountOptions: [bind] volumeID(corp.example.com/dfs#pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554)
I0528 20:08:20.939199 1 mount_linux.go:183] Mounting cmd (mount) with arguments ( -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/globalmount /var/lib/kubelet/pods/ea38ea2f-03af-459e-8038-24b46e4aadc1/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount)
I0528 20:08:20.946339 1 mount_linux.go:183] Mounting cmd (mount) with arguments ( -o bind,remount /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/globalmount /var/lib/kubelet/pods/ea38ea2f-03af-459e-8038-24b46e4aadc1/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount)
I0528 20:08:20.950624 1 nodeserver.go:85] NodePublishVolume: mount /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/globalmount at /var/lib/kubelet/pods/ea38ea2f-03af-459e-8038-24b46e4aadc1/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount volumeID(corp.example.com/dfs#pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554) successfully
I0528 20:08:20.950657 1 utils.go:83] GRPC response: {}
I0528 20:32:43.584571 1 utils.go:76] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0528 20:32:43.584595 1 utils.go:77] GRPC request: {"target_path":"/var/lib/kubelet/pods/ea38ea2f-03af-459e-8038-24b46e4aadc1/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount","volume_id":"corp.example.com/dfs#pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554"}
I0528 20:32:43.584950 1 nodeserver.go:100] NodeUnpublishVolume: unmounting volume corp.example.com/dfs#pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554 on /var/lib/kubelet/pods/ea38ea2f-03af-459e-8038-24b46e4aadc1/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount
I0528 20:32:43.585037 1 mount_linux.go:294] Unmounting /var/lib/kubelet/pods/ea38ea2f-03af-459e-8038-24b46e4aadc1/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount
W0528 20:32:43.660202 1 mount_helper_common.go:133] Warning: "/var/lib/kubelet/pods/ea38ea2f-03af-459e-8038-24b46e4aadc1/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount" is not a mountpoint, deleting
I0528 20:32:43.660321 1 nodeserver.go:105] NodeUnpublishVolume: unmount volume corp.example.com/dfs#pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554 on /var/lib/kubelet/pods/ea38ea2f-03af-459e-8038-24b46e4aadc1/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount successfully
What you expected to happen:
- provide log message why it couldn't delete the mount and recreate it
- maybe create a separate mount after several tries
How to reproduce it:
Not able to reproduce so far, but the issue did not go away after recreating the pod or even deleting the deployment. It successfully mounted when the pod was scheduled on a different node.
Anything else we need to know?:
Environment:
- CSI Driver version: v1.6.0
- Kubernetes version (use kubectl version): v1.21.5
- OS (e.g. from /etc/os-release): Ubuntu 18.04
I think os.Remove(mountPath) returns success though that dir is not deleted.
could you check why /var/lib/kubelet/pods/97094c31-59ad-4690-9359-53d9dea547d4/volumes/kubernetes.io~csi/pvc-aa0842c8-daf2-4924-8fbd-d32f627ca554/mount failed silently on that node?
Hi @andyzhangx thanks for the quick response! I don't have access to that node anymore since it got replaced, but I have set an alert to notify me when this happens again so I can troubleshoot it further. What steps do you suggest I take in order to check why it failed silently if it happens again?
Hi @andyzhangx ! The issue has come up again with a similar but not exactly the same error message as posted previously. This time I do not see any error messages on the CSI driver logs. I am still getting the CreateContainerError on the pod and can verify them from the kubelet logs. Any ideas on this?
csi smb driver daemon set pod logs
I0617 20:35:50.014716 1 main.go:90] set up prometheus server on [::]:29645
I0617 20:35:50.015356 1 smb.go:80]
DRIVER INFORMATION:
-------------------
Build Date: "2022-05-02T08:19:59Z"
Compiler: gc
Driver Name: smb.csi.k8s.io
Driver Version: v1.6.0
Git Commit: ""
Go Version: go1.17.3
Platform: linux/amd64
Streaming logs below:
I0617 20:35:50.015451 1 mount_linux.go:208] Detected OS without systemd
I0617 20:35:50.015458 1 driver.go:93] Enabling controller service capability: CREATE_DELETE_VOLUME
I0617 20:35:50.015462 1 driver.go:93] Enabling controller service capability: SINGLE_NODE_MULTI_WRITER
I0617 20:35:50.015465 1 driver.go:112] Enabling volume access mode: SINGLE_NODE_WRITER
I0617 20:35:50.015468 1 driver.go:112] Enabling volume access mode: SINGLE_NODE_READER_ONLY
I0617 20:35:50.015471 1 driver.go:112] Enabling volume access mode: SINGLE_NODE_SINGLE_WRITER
I0617 20:35:50.015473 1 driver.go:112] Enabling volume access mode: SINGLE_NODE_MULTI_WRITER
I0617 20:35:50.015475 1 driver.go:112] Enabling volume access mode: MULTI_NODE_READER_ONLY
I0617 20:35:50.015478 1 driver.go:112] Enabling volume access mode: MULTI_NODE_SINGLE_WRITER
I0617 20:35:50.015480 1 driver.go:112] Enabling volume access mode: MULTI_NODE_MULTI_WRITER
I0617 20:35:50.015482 1 driver.go:103] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I0617 20:35:50.015485 1 driver.go:103] Enabling node service capability: SINGLE_NODE_MULTI_WRITER
I0617 20:35:50.015488 1 driver.go:103] Enabling node service capability: VOLUME_MOUNT_GROUP
I0617 20:35:50.015611 1 server.go:118] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
I0617 20:35:50.220860 1 utils.go:76] GRPC call: /csi.v1.Identity/GetPluginInfo
I0617 20:35:50.220881 1 utils.go:77] GRPC request: {}
I0617 20:35:50.222195 1 utils.go:83] GRPC response: {"name":"smb.csi.k8s.io","vendor_version":"v1.6.0"}
I0617 20:35:50.547281 1 utils.go:76] GRPC call: /csi.v1.Identity/GetPluginInfo
I0617 20:35:50.547297 1 utils.go:77] GRPC request: {}
I0617 20:35:50.547370 1 utils.go:83] GRPC response: {"name":"smb.csi.k8s.io","vendor_version":"v1.6.0"}
I0617 20:35:50.940214 1 utils.go:76] GRPC call: /csi.v1.Node/NodeGetInfo
I0617 20:35:50.940231 1 utils.go:77] GRPC request: {}
I0617 20:35:50.940275 1 utils.go:83] GRPC response: {"node_id":"kon-shared-w2-core-nonprod-a2-u1-p2-658d885695-2jjw9"}
I0617 21:11:07.889950 1 utils.go:76] GRPC call: /csi.v1.Node/NodeStageVolume
I0617 21:11:07.889971 1 utils.go:77] GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/globalmount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["vers=3","file_mode=0775","dir_mode=0775","uid=1000","gid=1000","domain=corp"]}},"access_mode":{"mode":5}},"volume_context":{"source":"//corp.demo.com/dfs","storage.kubernetes.io/csiProvisionerIdentity":"1654548771536-8081-smb.csi.k8s.io"},"volume_id":"corp.demo.com/dfs#pvc-91da69c7-a5e0-406e-93ed-1575abf5565f#"}
I0617 21:11:07.890261 1 nodeserver.go:187] NodeStageVolume: targetPath(/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/globalmount) volumeID(corp.demo.com/dfs#pvc-91da69c7-a5e0-406e-93ed-1575abf5565f#) context(map[source://corp.demo.com/dfs storage.kubernetes.io/csiProvisionerIdentity:1654548771536-8081-smb.csi.k8s.io]) mountflags([vers=3 file_mode=0775 dir_mode=0775 uid=1000 gid=1000 domain=corp]) mountOptions([vers=3 file_mode=0775 dir_mode=0775 uid=1000 gid=1000 domain=corp])
I0617 21:11:07.891369 1 mount_linux.go:183] Mounting cmd (mount) with arguments (-t cifs -o vers=3,file_mode=0775,dir_mode=0775,uid=1000,gid=1000,domain=corp,<masked> //corp.demo.com/dfs /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/globalmount)
I0617 21:11:08.053410 1 nodeserver.go:216] volume(corp.demo.com/dfs#pvc-91da69c7-a5e0-406e-93ed-1575abf5565f#) mount "//corp.demo.com/dfs" on "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/globalmount" succeeded
I0617 21:11:08.053439 1 utils.go:83] GRPC response: {}
I0617 21:11:08.297000 1 utils.go:76] GRPC call: /csi.v1.Node/NodePublishVolume
I0617 21:11:08.297035 1 utils.go:77] GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/globalmount","target_path":"/var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["vers=3","file_mode=0775","dir_mode=0775","uid=1000","gid=1000","domain=corp"]}},"access_mode":{"mode":5}},"volume_context":{"csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"demo-8f6698cdb-rkj9r","csi.storage.k8s.io/pod.namespace":"team-demo-uat","csi.storage.k8s.io/pod.uid":"f3b83910-f1ed-48f7-8024-058be1efdea6","csi.storage.k8s.io/serviceAccount.name":"default","source":"//corp.demo.com/dfs","storage.kubernetes.io/csiProvisionerIdentity":"1654548771536-8081-smb.csi.k8s.io"},"volume_id":"corp.demo.com/dfs#pvc-91da69c7-a5e0-406e-93ed-1575abf5565f#"}
I0617 21:11:08.300370 1 nodeserver.go:78] NodePublishVolume: mounting /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/globalmount at /var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount with mountOptions: [bind] volumeID(corp.demo.com/dfs#pvc-91da69c7-a5e0-406e-93ed-1575abf5565f#)
I0617 21:11:08.300399 1 mount_linux.go:183] Mounting cmd (mount) with arguments ( -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/globalmount /var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount)
I0617 21:11:08.301843 1 mount_linux.go:183] Mounting cmd (mount) with arguments ( -o bind,remount /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/globalmount /var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount)
I0617 21:11:08.303692 1 nodeserver.go:85] NodePublishVolume: mount /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/globalmount at /var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount volumeID(corp.demo.com/dfs#pvc-91da69c7-a5e0-406e-93ed-1575abf5565f#) successfully
I0617 21:11:08.303711 1 utils.go:83] GRPC response: {}
kubelet error logs
Jul 03 18:02:21 node-p2-658d885695-2jjw9 kubelet[2760]: E0703 18:02:21.161869 2760 remote_runtime.go:228] "CreateContainer in sandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to generate container \"2636dac4304fad5b43b30b4ac012a0ad0553362e90b07f6763ff554a19ca3e5f\" spec: failed to generate spec: failed to mkdir \"/var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount\": mkdir /var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount: file exists" podSandboxID="333430c68e9db4c5032ac0d497e72d89f2a125767538bd05be09eea44145e17a"
Jul 03 18:02:21 node-p2-658d885695-2jjw9 kubelet[2760]: E0703 18:02:21.162163 2760 kuberuntime_manager.go:870] container &Container{...} start failed in pod demo-8f6698cdb-rkj9r_team-demo-uat(f3b83910-f1ed-48f7-8024-058be1efdea6): CreateContainerError: failed to generate container "2636dac4304fad5b43b30b4ac012a0ad0553362e90b07f6763ff554a19ca3e5f" spec: failed to generate spec: failed to mkdir "/var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount": mkdir /var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount: file exists
Jul 03 18:02:21 node-p2-658d885695-2jjw9 kubelet[2760]: E0703 18:02:21.162235 2760 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"demo\" with CreateContainerError: \"failed to generate container \\\"2636dac4304fad5b43b30b4ac012a0ad0553362e90b07f6763ff554a19ca3e5f\\\" spec: failed to generate spec: failed to mkdir \\\"/var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount\\\": mkdir /var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount: file exists\"" pod="team-demo-uat/demo-8f6698cdb-rkj9r" podUID=f3b83910-f1ed-48f7-8024-058be1efdea6
the mount path /var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount was mounted successfully in 6/17, why kubelet would try to mount that path again in 7/3?
Confused about that as well. Here's some info from the node:
root@node-p2-658d885695-2jjw9:~# ls /var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount
ls: cannot access '/var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount': No such file or directory
root@node-p2-658d885695-2jjw9:~# ls -lah /var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f
ls: cannot access '/var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount': No such file or directory
total 12K
drwxr-x--- 3 root root 4.0K Jun 17 21:11 .
drwxr-x--- 3 root root 4.0K Jun 17 21:11 ..
d????????? ? ? ? ? ? mount
-rw-r--r-- 1 root root 350 Jun 17 21:11 vol_data.json
root@node-p2-658d885695-2jjw9:~# stat /var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount
stat: cannot stat '/var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount': No such file or directory
root@node-p2-658d885695-2jjw9:~# mount | grep cifs
//corp.demo.com/dfs on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/globalmount type cifs (rw,relatime,vers=3,cache=strict,username=***stripped***,domain=corp,uid=1000,forceuid,gid=1000,forcegid,addr=***stripped***,file_mode=0775,dir_mode=0775,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1)
//corp.demo.com/dfs on /var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount type cifs (rw,relatime,vers=3,cache=strict,username=***stripped***,domain=corp,uid=1000,forceuid,gid=1000,forcegid,addr=***stripped***,file_mode=0775,dir_mode=0775,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1)
Could this be a stale file handle @andyzhangx ?
@seif-raafat that means the smb connection is broken, new mount works?
Hi @andyzhangx ! Yeah, I was able to create a new mount on the same node successfully. How could we mitigate this issue, as it does not resolve on its own? We have to manually recreate the pod to cause it to create a new mount.
root@node-p2-658d885695-2jjw9:~# ls -lah /var/lib/kubelet/pods/67f04126-4f32-41e2-b308-a51a0b7b80da/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/
total 20K
drwxr-x--- 3 root root 4.0K Jul 7 14:36 .
drwxr-x--- 3 root root 4.0K Jul 7 14:36 ..
drwxrwxr-x 2 anthos anthos 8.0K Apr 29 04:19 mount
-rw-r--r-- 1 root root 350 Jul 7 14:36 vol_data.json
root@node-p2-658d885695-2jjw9:~# stat /var/lib/kubelet/pods/67f04126-4f32-41e2-b308-a51a0b7b80da/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/
File: /var/lib/kubelet/pods/67f04126-4f32-41e2-b308-a51a0b7b80da/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/
Size: 4096 Blocks: 8 IO Block: 4096 directory
Device: 801h/2049d Inode: 4144384 Links: 3
Access: (0750/drwxr-x---) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2022-07-07 14:39:09.180301559 +0000
Modify: 2022-07-07 14:36:35.856362809 +0000
Change: 2022-07-07 14:36:35.856362809 +0000
Birth: -
root@node-p2-658d885695-2jjw9:~# mount | grep cifs
//corp.demo.com/dfs on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/globalmount type cifs (rw,relatime,vers=3,cache=strict,username=***stripped***,domain=corp,uid=1000,forceuid,gid=1000,forcegid,addr=***stripped***,file_mode=0775,dir_mode=0775,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1)
//corp.demo.com/dfs on /var/lib/kubelet/pods/f3b83910-f1ed-48f7-8024-058be1efdea6/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount type cifs (rw,relatime,vers=3,cache=strict,username=***stripped***,domain=corp,uid=1000,forceuid,gid=1000,forcegid,addr=***stripped***,file_mode=0775,dir_mode=0775,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1)
//corp.demo.com/dfs on /var/lib/kubelet/pods/67f04126-4f32-41e2-b308-a51a0b7b80da/volumes/kubernetes.io~csi/pvc-91da69c7-a5e0-406e-93ed-1575abf5565f/mount type cifs (rw,relatime,vers=3,cache=strict,username=***stripped***,domain=corp,uid=1000,forceuid,gid=1000,forcegid,addr=***stripped***,file_mode=0775,dir_mode=0775,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1)
Hi @andyzhangx following up on this, is there a way to periodically monitor (check the health of the mounts) and provide logs when a faulty one is found? This will help us detect issues early on. Taking it a step further, is it possible to eventually solve the broken smb connection issue automatically?
Hey guys.
It's what we've been facing for some time. Actually I think it has happened when we switched from using onprem smb shares to DFS shares.
testcronjob 0/1 ContainerCannotRun 0 12h 10.180.64.18 ip-xx-xxx-xxx-xx.ec2.internal
On affected nodes we have the same return for globalmounts:
[ec2-user@ip-xx-xxx-xxx-xx ~]$ sudo ls -l /var/lib/kubelet/plugins/kubernetes.io/csi/pv/smb-pv ls: cannot access /var/lib/kubelet/plugins/kubernetes.io/csi/pv/smb-pv/globalmount: No such file or directory total 4 d????????? ? ? ? ? ? globalmount -rw-r--r-- 1 root root 63 Jul 11 08:35 vol_data.json
It happens at random and we don't know the reason of that, but our cronjobs which we have (~300) keep getting blocked and we need to drain the nodes.
In kernel logs we can find presumably by the time it happens: [Tue Jul 12 21:30:29 2022] CIFS VFS: \share_server\share_name BAD_NETWORK_NAME: \share_server\share_name\path
In stats file we see only information: TreeConnects: 2223 total 2199 failed
Since controller is in the infinite loop, manual unmounting and mounting it doesn't help for newly created pods. We need to drain the whole node and move our resources to newly created.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
I think this is caused by Linux not playing well with DFS - I've seen it too and just switched to targeting the downstream server directly.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.