aws-ebs-csi-driver
aws-ebs-csi-driver copied to clipboard
Persistent Volumes with reclaimPolicy set to delete are not getting deleted
/kind bug
What happened?
Persistent Volumes with reclaimPolicy set to delete are not getting deleted
What you expected to happen?
Persistent Volumes with reclaimPolicy set to delete should get deleted
How to reproduce it (as minimally and precisely as possible)?
1)Create a runnerSet which is based on stateful set
2) Observe that PV are not geting reclaimed
Anything else we need to know?:
Errors from csi-provisioner container
E0213 14:24:47.649540 1 controller.go:1481] delete "pvc-fb37fa53-9249-44ed-9d69-19e00767e3ae": volume deletion failed: persistentvolume pvc-fb37fa53-9249-44ed-9d69-19e00767e3ae is still attached to node ip-10-10-2-245.eu-central-1.compute.internal
Environment
- Kubernetes version (use
kubectl version): 1.23 - Driver version: v1.15.0-eksbuild.1
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Same issue here :+1:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
Same issue here 👍
/remove-lifecycle rotten
Seeing the same with Kubernetes v1.27
I had to add the following to get the creation + deletion to work
ec2:CreateVolume // creation
ec2:CreateTags // creation
ec2:AttachVolume // creation
ec2:DetachVolume // deletion
ec2:DeleteVolume // deletion
You are probably missing the last two actions
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
I had to add the following to get the creation + deletion to work
ec2:CreateVolume // creation ec2:CreateTags // creation ec2:AttachVolume // creation ec2:DetachVolume // deletion ec2:DeleteVolume // deletionYou are probably missing the last two actions
It does not work with "arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy" too.
602401143452.dkr.ecr.eu-west-1.amazonaws.com/eks/aws-ebs-csi-driver:v1.26.1 k8s version: v1.28.6-eks-508b6b3
We experience the same issue. It seems to be a race condition.
Here are the logs of the csi-provisioner
I0517 11:22:47.641553 1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"pvc-6a2a97fe-ded6-48c9-9f57-9fa7624c72e7", UID:"5aaae63a-4cca-4043-aea8-bcb53d578704", APIVersion:"v1", ResourceVersion:"143899969", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' persistentvolume pvc-6a2a97fe-ded6-48c9-9f57-9fa7624c72e7 is still attached to node ip-10-156-74-114.eu-central-1.compute.internal
and the logs of the ebs-plugin
I0517 11:22:52.094812 1 controller.go:471] "ControllerUnpublishVolume: detaching" volumeID="vol-0a570aeae7332b2e7" nodeID="i-075b7bf75732e92f7"
I0517 11:22:53.578965 1 cloud.go:862] "Waiting for volume state" volumeID="vol-0a570aeae7332b2e7" actual="detaching" desired="detached"
I0517 11:22:54.699412 1 cloud.go:862] "Waiting for volume state" volumeID="vol-0a570aeae7332b2e7" actual="detaching" desired="detached"
I0517 11:22:57.610889 1 cloud.go:862] "Waiting for volume state" volumeID="vol-0a570aeae7332b2e7" actual="detaching" desired="detached"
I0517 11:23:01.976813 1 controller.go:479] "ControllerUnpublishVolume: detached" volumeID="vol-0a570aeae7332b2e7" nodeID="i-075b7bf75732e92f7"
where pvc-6a2a97fe-ded6-48c9-9f57-9fa7624c72e7 corresponds to vol-0a570aeae7332b2e7
The deletion fails before detaching is finished.
Kubernetes version: 1.26.1
EBS driver version: 1.26.1
/close
The original issue appears to be due to a bug in the Github actions Kubernetes runner: https://github.com/actions/actions-runner-controller/issues/2266
Volumes must not be attached to any node prior to deletion - this means they cannot be in use by any pod. If you delete a volume before it is detached, you will receive is still attached to node events until the corresponding pod(s) is deleted and the volume finishes detaching.
Note also that after a pod terminates, you may experience this event for a short period of time while the volume is detaching if the volume is immediately deleted. This is because volumes are not detached until after the pod terminates and detaching typically takes several seconds for EBS to perform. This is expected and the volume should proceed with deletion shortly after the detach succeeds.
If you are able to reproduce this error in the condition where no pod is currently using the volume and the volume gets stuck in the deleting state with an is still attached to node event for an extended period of time, please open a new issue with the reproduction steps (strongly preferred if possible) or relevant log entries.
@ConnorJC3: Closing this issue.
In response to this:
/close
The original issue appears to be due to a bug in the Github actions Kubernetes runner: https://github.com/actions/actions-runner-controller/issues/2266
Volumes must not be attached to any node prior to deletion - this means they cannot be in use by any pod. If you delete a volume before it is detached, you will receive
is still attached to nodeevents until the corresponding pod(s) is deleted and the volume finishes detaching.Note also that after a pod terminates, you may experience this event for a short period of time while the volume is detaching if the volume is immediately deleted. This is because volumes are not detached until after the pod terminates and detaching typically takes several seconds for EBS to perform. This is expected and the volume should proceed with deletion shortly after the detach succeeds.
If you are able to reproduce this error in the condition where no pod is currently using the volume and the volume gets stuck in the deleting state with an
is still attached to nodeevent for an extended period of time, please open a new issue with the reproduction steps (strongly preferred if possible) or relevant log entries.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.