csi-driver
csi-driver copied to clipboard
Random volume mount error
This error follows me for the last year.
tax-mysql-0 0/2 Init:0/1 0 3m5s
Warning FailedAttachVolume 119s attachdetach-controller AttachVolume.Attach failed for volume "pvc-b269c782-6c14-41d8-a043-9b8c890e485a" : rpc error: code = Internal desc = failed to publish volume: Volume assigning step has failed due to an unknown error. (unknown_error)
Pod randomly can't create volume mount. And stays in Init state forever. Sometimes cloud node on which pod was running goes offline, but not always. I need to go to Hetzner console and turn node on manually. Workaround is deleting pod and wait for another try. csi version 1.6.0 kubernetes 1.19.9
I am getting the same error. Restarting the pod does not help.
Volume is beeing bound and created but cannot be attached. Only applied to one pod in a stateful set of 4.
@nmass-betpawa Hard to tell. Can you provide logs of the relevant pods (hcloud-csi-node-xxx
and hcloud-csi-controller-xxx
)?
kubectl -n kube-system logs hcloud-csi-controller-0 -c hcloud-csi-driver
level=info ts=2022-03-04T18:27:07.770537039Z component=api-volume-service msg="attaching volume" volume-id=17427653 server-id=13700044
level=info ts=2022-03-04T18:27:07.771223989Z component=api-volume-service msg="attaching volume" volume-id=17427671 server-id=13700044
level=info ts=2022-03-04T18:27:12.805624266Z component=api-volume-service msg="failed to attach volume" volume-id=17427671 server-id=13700044 err="cannot perform operation because server is locked (locked)"
level=error ts=2022-03-04T18:27:12.863666053Z component=grpc-server msg="handler failed" err="rpc error: code = Unavailable desc = failed to publish volume: server is locked"
level=info ts=2022-03-04T18:27:17.336704814Z component=api-volume-service msg="failed to attach volume" volume-id=17427653 server-id=13700044 err="Volume assigning step has failed due to an unknown error. (unknown_error)"
level=error ts=2022-03-04T18:27:17.390385673Z component=grpc-server msg="handler failed" err="rpc error: code = Internal desc = failed to publish volume: Volume assigning step has failed due to an unknown error. (unknown_error)"
kubectl -n kube-system logs hcloud-csi-controller-0 -c csi-provisioner | grep pvc-61295e91-e439-4423-a91d-f1794c1c3f3c
I0304 17:49:15.441626 1 controller.go:1439] provision "pr-104/admin-managment-database" class "hcloud-volumes": volume "pvc-61295e91-e439-4423-a91d-f1794c1c3f3c" provisioned
I0304 17:49:16.347715 1 controller.go:1341] provision "pr-104/admin-managment-database" class "hcloud-volumes": persistentvolume "pvc-61295e91-e439-4423-a91d-f1794c1c3f3c" already exists, skipping
I0304 17:49:16.359257 1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"pr-104", Name:"admin-managment-database", UID:"61295e91-e439-4423-a91d-f1794c1c3f3c", APIVersion:"v1", ResourceVersion:"608162803", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-61295e91-e439-4423-a91d-f1794c1c3f3c
no logs here
kubectl -n kube-system logs hcloud-csi-node-q85tq -c hcloud-csi-driver
level=info ts=2022-03-04T18:17:31.125716915Z msg="fetched server" server-name=kube-node-40-qa
kubectl -n kube-system logs hcloud-csi-node-q85tq -c csi-node-driver-registrar
I0304 18:17:29.334885 1 main.go:113] Version: v2.2.0
I0304 18:17:31.242022 1 node_register.go:52] Starting Registration Server at: /registration/csi.hetzner.cloud-reg.sock
I0304 18:17:31.242615 1 node_register.go:61] Registration Server started at: /registration/csi.hetzner.cloud-reg.sock
I0304 18:17:31.242856 1 node_register.go:83] Skipping healthz server because HTTP endpoint is set to: ""
I0304 18:17:31.908776 1 main.go:80] Received GetInfo call: &InfoRequest{}
I0304 18:17:31.958822 1 main.go:90] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}

after node reboot through hetzner web interface everything works(but reboot from linux console doesn't help)
Hetzner said its a known error :)
Can we assume, that this mounting error is not csi-driver related but a general volume attach problem of Hetzner Cloud? Has one tried, when csi-driver is unable to attach the volume, to manually attach the volume via the Hetzner Console or using hcloud
cli tool?
Has one tried, when csi-driver is unable to attach the volume, to manually attach the volume via the Hetzner Console
Yes, tried, works
Has one tried, when csi-driver is unable to attach the volume, to manually attach the volume via the Hetzner Console
Yes, tried, works
@scasei Via hcloud cli tool as well? The cli tool (as it talks via the API as well) should do what csi-driver does while changes via the web console might work differently internally. Just want to find out if the issue is in the API itself or in how csi-driver uses the API
@choffmeister only via console.hetzner.cloud, had to detach all volumes from vm, restart server and attach them manually again.
Sometimes node gets turned off after failed volume mount, from logs it seems like powerloss, with no error in syslog at all. So I need to go to hetzner web console and turn it on manually. Indeed maybe it is related to virtualization errors somehow. But this volume mount problem happens very often.
Seems to be the same problem as #6 , which shouldn't have been closed.
Experience the same issue on our clusters here. Are there any updates to fix this issue? Rebooting the nodes is only a work around.
I also have random errors if i try to mount hetzner volumes. Only reboots are a workaround. This situation is very critical to us.
Found another workaround:
- Delete volumeattachments.storage.k8s.io which are not attached and does match your pvc which can't be attached to your pod
- Delete related pod and wait to be recreated.
This fixed the issue for me temporarily
Or try killing/deleting the csi-node-pods, wait till they start again, kill/delete the pod which has the mount error
none of the soft workarounds seem to work for me :(
lots of work arounds here but no fixes I'm also getting this issue
my issue was related to using rke2, after switching to a different flavour (used kubeone to set it up for me), I have no more issues.
This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.
I would like to take a look at this, but I will need some debug logs. It would be great if you can activate the debug logs and then send me the sections where this bug happens to my email julian.toelle <at> hetzner-cloud.de
This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.
@apricote thanks for offering to help!
We have similar issues. Sometimes volumes take forever to mount to the pod which results in startup times of 10min+:
Normal Scheduled 11m default-scheduler Successfully assigned pod/pod-54dc234c3-t3t54 to node-name
Warning FailedMount 5m20s (x2 over 7m34s) kubelet Unable to attach or mount volumes: unmounted volumes=[pod-cache-volume], unattached volumes=[pod-config-volume pod-cache-volume kube-api-access-t441t]: timed out waiting for the condition
Warning FailedMount 51s (x3 over 9m49s) kubelet Unable to attach or mount volumes: unmounted volumes=[pod-cache-volume], unattached volumes=[pod-cache-volume kube-api-access-t441t pod-config-volume]: timed out waiting for the condition
Normal Pulling 5s kubelet Pulling image "image"
Normal Pulled 4s kubelet Successfully pulled image "image" in 570.451336ms (570.476615ms including waiting)
Normal Created 4s kubelet Created container pod
Normal Started 4s kubelet Started container pod
The volume cannot be mounted but according to Hetzner CSI Controller the disk is attached, opened (LUKS) and ready.
Sometimes we also get errors, that state that the server is locked.
Warning FailedAttachVolume 2m59s (x2 over 3m1s) attachdetach-controller AttachVolume.Attach failed for volume "pvc-233f32ff-78cd-c884-234c-f9898a9f8989" : rpc error: code = Unavailable desc = failed to publish volume: server is locked
Normal SuccessfulAttachVolume 2m58s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-233f32ff-78cd-c884-234c-f9898a9f8989"
Warning FailedMount 40s (x2 over 2m54s) kubelet Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data kube-api-access-t441t]: timed out waiting for the condition
This is very annoying and we would be very happy to assist finding the issue!
We already had a look into our debug logs and were not able to find any issues there. These logs are generated close to pod scheduling:
hcloud-csi-driver level=debug component=grpc-server msg="handling request" method=/csi.v1.Node/NodeUnpublishVolume req="volume_id:\"123456789\" target_path:\"/var/lib/kubelet/pods/podID/volumes/kubernetes.io~csi/pvcID/mount\" "
hcloud-csi-driver level=info component=linux-mount-service msg="unpublishing volume" target-path=/var/lib/kubelet/pods/podID/volumes/kubernetes.io~csi/pvcID/mount device-path=/dev/mapper/scsi-0HC_Volume_123456789
hcloud-csi-driver 1 mount_helper_common.go:133] Warning: "/var/lib/kubelet/pods/podID/volumes/kubernetes.io~csi/pvcID/mount" is not a mountpoint, deleting
hcloud-csi-driver level=info component=linux-mount-service msg="closing LUKS device" luksDeviceName=scsi-0HC_Volume_123456789
hcloud-csi-driver level=debug component=grpc-server msg="finished handling request" method=/csi.v1.Node/NodeUnpublishVolume err=null
hcloud-csi-driver level=debug grpc-server msg="handling request" method=/csi.v1.Node/NodePublishVolume req="volume_id:\"123456789\" publish_context:<key:\"devicePath\" value:\"/dev/disk/by-id/scsi-0HC_Volume_123456789\" > target_path:\"/var/lib/kubelet/pods/podID/volumes/kubernetes.io~csi/pvcID/mount\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > secrets:<key:\"encryption-passphrase\" value:\"XXXXXXXXXXXXXX\" > volume_context:<key:\"csi.storage.k8s.io/ephemeral\" value:\"false\" > volume_context:<key:\"csi.storage.k8s.io/pod.name\" value:\"podName\" > volume_context:<key:\"csi.storage.k8s.io/pod.namespace\" value:\"namespace\" > volume_context:<key:\"csi.storage.k8s.io/pod.uid\" value:\"podID\" > volume_context:<key:\"csi.storage.k8s.io/serviceAccount.name\" value:\"serviceAccountName\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"id-csi.hetzner.cloud\" > "
hcloud-csi-driver level=info linux-mount-service msg="opening LUKS device" devicePath=/dev/disk/by-id/scsi-0HC_Volume_123456789 luksDeviceName=scsi-0HC_Volume_123456789
hcloud-csi-driver level=info linux-mount-service msg="publishing volume" target-path=/var/lib/kubelet/pods/podID/volumes/kubernetes.io~csi/pvcID/mount device-path=/dev/mapper/scsi-0HC_Volume_123456789 fs-type=ext4 block-volume=false readonly=false mount-options= encrypted=true
hcloud-csi-driver level=debug grpc-server msg="finished handling request" method=/csi.v1.Node/NodePublishVolume err=null
Then we see nothing for a long time (thats probably the time during which the pod isn't able to mount the volume). At some point, when the volume was successfully mounted, we see these logs:
hcloud-csi-node-j5vzl hcloud-csi-driver level=debug component=grpc-server msg="handling request" method=/csi.v1.Node/NodeGetVolumeStats req="volume_id:\"0HC_Volume_123456789\" volume_path:\"/var/lib/kubelet/pods/podID/volumes/kubernetes.io~csi/pvcID/mount\" "
hcloud-csi-node-j5vzl hcloud-csi-driver level=debug component=linux-mount-service msg="checking path existence" path=/var/lib/kubelet/pods/podID/volumes/kubernetes.io~csi/pvcID/mount
hcloud-csi-node-j5vzl hcloud-csi-driver level=debug component=grpc-server msg="finished handling request" method=/csi.v1.Node/NodeGetVolumeStats err=null
We don't know what changed or what made the volume suddenly mountable.
This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.