VolumeFailedDelete events
TL;DR
CSI controller is generating frequent 'VolumeFailedDelete' warning events with the message 'persistentvolume pvc-
Expected behavior
Persistent volumes attach/detach smoothly
Observed behavior
When attaching/detaching persistent volumes, particularly for high-churn use cases both the CSI controller and PV dependent pods are producing lots of warnings and adding latency into the cluster.
Minimal working example
Try setting up ARC (Actions Runner Controller for GitHub actions) in K8s mode and triggering a few workflows:
---
apiVersion: v1
kind: Namespace
metadata:
name: arc-runners
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: arc-runner-scale-set
namespace: arc-runners
spec:
interval: 30m
chart:
spec:
chart: gha-runner-scale-set
sourceRef:
kind: HelmRepository
name: arc
namespace: arc-systems
interval: 12h
version: 0.13.0
values:
maxRunners: 2
minRunners: 1
githubConfigUrl: https://github.com/<YOUR REPO>
githubConfigSecret: <YOUR REPO SECRET>
runnerScaleSetName: self-hosted-runner
containerMode:
type: kubernetes
kubernetesModeWorkVolumeClaim:
accessModes: ["ReadWriteOnce"]
storageClassName: hcloud-volumes
resources:
requests:
storage: 1Gi
template:
spec:
securityContext:
fsGroup: 1001
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/home/runner/run.sh"]
Log output
apiVersion: v1
kind: Event
metadata:
name: pvc-efa81f1e-869e-49dc-99f4-9390cd00758a.187a66094e8c405a
namespace: default
uid: 323dca32-2ebb-4f30-9877-a2ebf07dff8b
resourceVersion: '55312424'
creationTimestamp: '2025-11-22T17:59:54Z'
selfLink: >-
/api/v1/namespaces/default/events/pvc-efa81f1e-869e-49dc-99f4-9390cd00758a.187a66094e8c405a
spec: {}
involvedObject:
kind: PersistentVolume
name: pvc-efa81f1e-869e-49dc-99f4-9390cd00758a
uid: ad3c9a95-0843-4e7f-a6c6-3a80a2255d4a
apiVersion: v1
resourceVersion: '55312391'
reason: VolumeFailedDelete
message: >-
persistentvolume pvc-efa81f1e-869e-49dc-99f4-9390cd00758a is still attached to
node cloud-02
source:
component: >-
csi.hetzner.cloud_hcloud-csi-controller-567f89966f-4x2w5_2950c7a3-f602-4696-95f9-d014acc08356
firstTimestamp: '2025-11-22T17:59:54Z'
lastTimestamp: '2025-11-22T18:00:01Z'
count: 4
type: Warning
eventTime: null
reportingComponent: >-
csi.hetzner.cloud_hcloud-csi-controller-567f89966f-4x2w5_2950c7a3-f602-4696-95f9-d014acc08356
reportingInstance: ''
---
apiVersion: v1
kind: Event
metadata:
name: my-cloud-tp9sg-runner-hhjpx.187a65295d0af25f
namespace: arc-runners
uid: e11e7370-37cb-4860-886c-8180d3616367
resourceVersion: '55308459'
creationTimestamp: '2025-11-22T17:43:52Z'
selfLink: >-
/api/v1/namespaces/arc-runners/events/my-cloud-tp9sg-runner-hhjpx.187a65295d0af25f
spec: {}
involvedObject:
kind: Pod
namespace: arc-runners
name: my-cloud-tp9sg-runner-hhjpx
uid: 9a871964-3534-4f2f-b9c2-97668b05fdf5
apiVersion: v1
resourceVersion: '55308341'
reason: FailedMount
message: >-
MountVolume.SetUp failed for volume "pvc-efa81f1e-869e-49dc-99f4-9390cd00758a"
: rpc error: code = Internal desc = failed to publish volume: format of disk
"/dev/disk/by-id/scsi-0HC_Volume_104028078" failed: type:("ext4")
target:("/var/lib/kubelet/pods/9a871964-3534-4f2f-b9c2-97668b05fdf5/volumes/kubernetes.io~csi/pvc-efa81f1e-869e-49dc-99f4-9390cd00758a/mount")
options:("defaults") errcode:(exit status 1) output:(mke2fs 1.47.1
(20-May-2024)
The file /dev/disk/by-id/scsi-0HC_Volume_104028078 does not exist and no size
was specified.
)
source:
component: kubelet
host: cloud-02
firstTimestamp: '2025-11-22T17:43:52Z'
lastTimestamp: '2025-11-22T17:43:56Z'
count: 4
type: Warning
eventTime: null
reportingComponent: kubelet
reportingInstance: cloud-02
---
apiVersion: v1
kind: Event
metadata:
name: my-cloud-tp9sg-runner-hhjpx-work.187a652518b33d76
namespace: arc-runners
uid: c05af831-89e3-4380-91cf-4f7ac3718f9f
resourceVersion: '55308317'
creationTimestamp: '2025-11-22T17:43:34Z'
selfLink: >-
/api/v1/namespaces/arc-runners/events/my-cloud-tp9sg-runner-hhjpx-work.187a652518b33d76
spec: {}
involvedObject:
kind: PersistentVolumeClaim
namespace: arc-runners
name: my-cloud-tp9sg-runner-hhjpx-work
uid: efa81f1e-869e-49dc-99f4-9390cd00758a
apiVersion: v1
resourceVersion: '55308263'
reason: ProvisioningFailed
message: >-
failed to provision volume with StorageClass "hcloud-volumes": rpc error: code
= DeadlineExceeded desc = context deadline exceeded
source:
component: >-
csi.hetzner.cloud_hcloud-csi-controller-567f89966f-4x2w5_2950c7a3-f602-4696-95f9-d014acc08356
firstTimestamp: '2025-11-22T17:43:34Z'
lastTimestamp: '2025-11-22T17:43:34Z'
count: 1
type: Warning
eventTime: null
reportingComponent: >-
csi.hetzner.cloud_hcloud-csi-controller-567f89966f-4x2w5_2950c7a3-f602-4696-95f9-d014acc08356
reportingInstance: ''
---
apiVersion: v1
kind: Event
metadata:
name: my-cloud-tp9sg-runner-r7728.187a65228e26c5c3
namespace: arc-runners
uid: 6696bc08-f201-433e-af85-6207aa913412
resourceVersion: '55308253'
creationTimestamp: '2025-11-22T17:43:23Z'
selfLink: >-
/api/v1/namespaces/arc-runners/events/my-cloud-tp9sg-runner-r7728.187a65228e26c5c3
spec: {}
involvedObject:
kind: Pod
namespace: arc-runners
name: my-cloud-tp9sg-runner-r7728
uid: 67965bc8-169f-4229-b15d-dd065a687968
apiVersion: v1
resourceVersion: '55308196'
reason: FailedScheduling
message: >-
running PreBind plugin "VolumeBinding": binding volumes: pod does not exist
any more: pod "my-cloud-tp9sg-runner-r7728" not found
source:
component: default-scheduler
firstTimestamp: '2025-11-22T17:43:23Z'
lastTimestamp: '2025-11-22T17:43:23Z'
count: 1
type: Warning
eventTime: null
reportingComponent: default-scheduler
reportingInstance: ''
---
apiVersion: v1
kind: Event
metadata:
name: my-cloud-tp9sg-runner-hhjpx.187a6522647ba8dc
namespace: arc-runners
uid: 2b1b99f7-5e0c-4032-80e3-0115f50b1c13
resourceVersion: '55308215'
creationTimestamp: '2025-11-22T17:43:22Z'
selfLink: >-
/api/v1/namespaces/arc-runners/events/my-cloud-tp9sg-runner-hhjpx.187a6522647ba8dc
spec: {}
involvedObject:
kind: Pod
namespace: arc-runners
name: my-cloud-tp9sg-runner-hhjpx
uid: 9a871964-3534-4f2f-b9c2-97668b05fdf5
apiVersion: v1
resourceVersion: '55308214'
reason: FailedScheduling
message: >-
0/3 nodes are available: waiting for ephemeral volume controller to create the
persistentvolumeclaim "my-cloud-tp9sg-runner-hhjpx-work". preemption: 0/3
nodes are available: 3 Preemption is not helpful for scheduling.
source:
component: default-scheduler
firstTimestamp: '2025-11-22T17:43:22Z'
lastTimestamp: '2025-11-22T17:43:22Z'
count: 1
type: Warning
eventTime: null
reportingComponent: default-scheduler
reportingInstance: ''
---
apiVersion: v1
kind: Event
metadata:
name: pvc-a4919e2d-32ac-4aed-9f3d-a9c189e5e60c.187a650de6d88b55
namespace: default
uid: 3be65217-10dd-4c8b-aaa0-faf4155c4174
resourceVersion: '55307693'
creationTimestamp: '2025-11-22T17:41:54Z'
selfLink: >-
/api/v1/namespaces/default/events/pvc-a4919e2d-32ac-4aed-9f3d-a9c189e5e60c.187a650de6d88b55
spec: {}
involvedObject:
kind: PersistentVolume
name: pvc-a4919e2d-32ac-4aed-9f3d-a9c189e5e60c
uid: 5e2fd3f0-f078-4586-98d1-9318c94fda83
apiVersion: v1
resourceVersion: '55307658'
reason: VolumeFailedDelete
message: >-
persistentvolume pvc-a4919e2d-32ac-4aed-9f3d-a9c189e5e60c is still attached to
node cloud-02
source:
component: >-
csi.hetzner.cloud_hcloud-csi-controller-567f89966f-4x2w5_2950c7a3-f602-4696-95f9-d014acc08356
firstTimestamp: '2025-11-22T17:41:54Z'
lastTimestamp: '2025-11-22T17:42:01Z'
count: 4
type: Warning
eventTime: null
reportingComponent: >-
csi.hetzner.cloud_hcloud-csi-controller-567f89966f-4x2w5_2950c7a3-f602-4696-95f9-d014acc08356
reportingInstance: ''
---
apiVersion: v1
kind: Event
metadata:
name: my-cloud-jf8z8-runner-dt6kp-work.187a650834cdd775
namespace: arc-runners
uid: 23fd02b0-1b22-428a-a57a-ef8468c47ec0
resourceVersion: '55307476'
creationTimestamp: '2025-11-22T17:41:29Z'
selfLink: >-
/api/v1/namespaces/arc-runners/events/my-cloud-jf8z8-runner-dt6kp-work.187a650834cdd775
spec: {}
involvedObject:
kind: PersistentVolumeClaim
namespace: arc-runners
name: my-cloud-jf8z8-runner-dt6kp-work
uid: bf39e236-c4aa-452d-873b-7539accc100f
apiVersion: v1
resourceVersion: '55307423'
reason: ProvisioningFailed
message: >-
failed to provision volume with StorageClass "hcloud-volumes": rpc error: code
= DeadlineExceeded desc = context deadline exceeded
source:
component: >-
csi.hetzner.cloud_hcloud-csi-controller-567f89966f-4x2w5_2950c7a3-f602-4696-95f9-d014acc08356
firstTimestamp: '2025-11-22T17:41:29Z'
lastTimestamp: '2025-11-22T17:41:29Z'
count: 1
type: Warning
eventTime: null
reportingComponent: >-
csi.hetzner.cloud_hcloud-csi-controller-567f89966f-4x2w5_2950c7a3-f602-4696-95f9-d014acc08356
reportingInstance: ''
Additional information
No response
Hey, I am not familiar with the Actions Runner Controller, but I can take a look at it.
It feels like ARC is expecting the csi-driver to be faster here. Did you already experiment with some config parameters to allow for longer startup times of the pods? My gut feeling teels me, this might be a limitation of the Volumes product/API and we can´t do a lot in the csi-driver. Nonetheless, I will see if I can reproduce this and see if there is something on our end.
Would you have any logs of the hcloud-csi-controller and hcloud-csi-node drivers? If you can provide some, please use the --all-containers flag, as the csi Pods consist of multiple containers.
Best Regards, Lukas
Hey, I took a look at this.
The hcloud-csi-driver consists of multiple containers. One of them is the external-provisioner, who is responsible for watches the PVC objects and emits gRPC calls to the hcloud-csi-controller to Create/Delete/... Hetzner Cloud Volumes. The provisioner is doing a pre-check if the Volume is still attached to another node and emits the VolumeFailedDelete event. So this is not in our control and works as intended.
That said, you can see some additional events like this:
reason: ProvisioningFailed
message: >-
failed to provision volume with StorageClass "hcloud-volumes": rpc error: code
= DeadlineExceeded desc = context deadline exceeded
You might be able to get rid/reduce these events, by setting the external-provisioner --timeout flag to something like 45s to 60s. Currently, we don't offer a way to do this in the Helm chart, but if it works for you, we might add this. Could you try this out and see if it works for your case?
Hi @lukasmetzner , thanks for taking a look at this. Since writing up the issue I've moved to using Open EBS local PV provisioner which has cleared up the errors completely. For ARC, it's likely a better fit anyway.
That said, I've noticed these errors appearing frequently on hcloud volumes – not just those using ARC. Particularly when an app that depends on them rolls out a new version and restarts a bunch of pods in series for example. I think the volume churn of the ARC controller just makes the issue harder to miss, but I'll try and capture any new logs that come up and attach them to this issue.
I've attached a copy of my setup of the Local PV helm repo/chart values that behaves with ARC as I'd expect. I don't have any detailed knowledge of how CSIs work under the hood, but perhaps It might help to identify differences between the attach/detach behaviour.
EDIT: I'm actually using the 'openebs-hostpath' storage class which is installed by the chart with ARC, so you can ignore the one in the yaml.
containerMode:
type: kubernetes
kubernetesModeWorkVolumeClaim:
accessModes: ["ReadWriteOnce"]
storageClassName: openebs-hostpath
resources:
requests:
storage: 1Gi