aws-ebs-csi-driver icon indicating copy to clipboard operation
aws-ebs-csi-driver copied to clipboard

Volume are not delete on AWS

Open shinji62 opened this issue 2 years ago • 6 comments

/kind bug

What happened? Deleting persistent volume in eks do not delete them in AWS

What you expected to happen? Volume to be deleted in AWS

How to reproduce it (as minimally and precisely as possible)? Create PVC with Dynamic volume and to delete them in k8s

Anything else we need to know?:

When I check the log he complain about the finalizer, when I check the volume before the deleting I am seeing those finalizer

Finalizers:        [kubernetes.io/pv-protection external-attacher/ebs-csi-aws-com]
ebs-csi-controller-7f7d8649dd-bq2gp csi-attacher E0707 02:20:43.007535       1 csi_handler.go:698] Failed to remove finalizer from PV "pvc-9b8fd5d9-a0cb-4792-ac5f-c4e96cdfa431": PersistentVolume "pvc-9b8fd5d9-a0cb-4792-ac5f-c4e96cdfa431" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{"kubernetes.io/pv-protection"}
ebs-csi-controller-7f7d8649dd-bq2gp csi-attacher I0707 02:20:43.024891       1 csi_handler.go:703] Removed finalizer from PV "pvc-9b8fd5d9-a0cb-4792-ac5f-c4e96cdfa431"

Environment

  • Kubernetes version (use kubectl version): v1.21.12-eks-a64ea69
  • Driver version:
app.kubernetes.io/version=1.8.0                                                                                                                                                              
helm.sh/chart=aws-ebs-csi-driver-2.8.0

shinji62 avatar Jul 07 '22 02:07 shinji62

This is intended behavior if reclaimPolicy: Retain is specified in your StorageClass manifest. See #1071

torredil avatar Jul 11 '22 17:07 torredil

@torredil Thanks for the answer but the SC reclainPolicy is Delete as well in the pv/pvc

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    meta.helm.sh/release-name: aws-ebs-csi-driver
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2022-07-06T04:44:49Z"
  labels:
    app.kubernetes.io/managed-by: Helm
  name: gp3
  resourceVersion: "6576174"
  uid: 2c17babc-818f-46f3-bc69-83f82b07d6f6
parameters:
  type: gp3
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

shinji62 avatar Jul 12 '22 02:07 shinji62

@shinji62 Thanks for providing the StorageClass manifest. I just ran through the dynamic provisioning example and was not able to reproduce this. The deletion of a PVC object bound to a PV corresponding to this driver with a delete reclaim policy causes the external-provisioner sidecar container to trigger a DeleteVolume operation. Once the volume is successfully deleted, the sidecar container also deletes the PV object representing the volume.

You might be able to get more insight into what's going on if you take a look at the csi-provisioner container logs: kubectl logs -n kube-system $(kubectl get lease -n kube-system ebs-csi-aws-com -o=jsonpath="{.spec.holderIdentity}") -c csi-provisioner.

  • kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.12", GitCommit:"696a9fdd2a58340e61e0d815c5769d266fca0802", GitTreeState:"clean", BuildDate:"2022-04-13T19:07:00Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.12", GitCommit:"696a9fdd2a58340e61e0d815c5769d266fca0802", GitTreeState:"clean", BuildDate:"2022-04-13T19:01:10Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
  • kubectl describe deployment ebs-csi-controller -n kube-system
Name:                   ebs-csi-controller
Namespace:              kube-system
CreationTimestamp:      Tue, 12 Jul 2022 19:26:13 +0000
Labels:                 app.kubernetes.io/component=csi-driver
                        app.kubernetes.io/instance=aws-ebs-csi-driver
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=aws-ebs-csi-driver
                        app.kubernetes.io/version=1.8.0
                        helm.sh/chart=aws-ebs-csi-driver-2.8.0
Annotations:            deployment.kubernetes.io/revision: 1
                        meta.helm.sh/release-name: aws-ebs-csi-driver
                        meta.helm.sh/release-namespace: kube-system
Selector:               app=ebs-csi-controller,app.kubernetes.io/instance=aws-ebs-csi-driver,app.kubernetes.io/name=aws-ebs-csi-driver
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=ebs-csi-controller
                    app.kubernetes.io/component=csi-driver
                    app.kubernetes.io/instance=aws-ebs-csi-driver
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=aws-ebs-csi-driver
                    app.kubernetes.io/version=1.8.0
                    helm.sh/chart=aws-ebs-csi-driver-2.8.0
  Service Account:  ebs-csi-controller-sa
  Containers:
   ebs-plugin:
    Image:      public.ecr.aws/ebs-csi-driver/aws-ebs-csi-driver:v1.8.0
    Port:       9808/TCP
    Host Port:  0/TCP
    Args:
      controller
      --endpoint=$(CSI_ENDPOINT)
      --logtostderr
      --v=2
    Liveness:   http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
    Readiness:  http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
    Environment:
      CSI_ENDPOINT:           unix:///var/lib/csi/sockets/pluginproxy/csi.sock
      CSI_NODE_NAME:           (v1:spec.nodeName)
      AWS_ACCESS_KEY_ID:      <set to the key 'key_id' in secret 'aws-secret'>      Optional: true
      AWS_SECRET_ACCESS_KEY:  <set to the key 'access_key' in secret 'aws-secret'>  Optional: true
      AWS_EC2_ENDPOINT:       <set to the key 'endpoint' of config map 'aws-meta'>  Optional: true
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
   csi-provisioner:
    Image:      k8s.gcr.io/sig-storage/csi-provisioner:v3.1.0
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=$(ADDRESS)
      --v=2
      --feature-gates=Topology=true
      --extra-create-metadata
      --leader-election=true
      --default-fstype=ext4
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
   csi-attacher:
    Image:      k8s.gcr.io/sig-storage/csi-attacher:v3.4.0
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=$(ADDRESS)
      --v=2
      --leader-election=true
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
   csi-resizer:
    Image:      k8s.gcr.io/sig-storage/csi-resizer:v1.4.0
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=$(ADDRESS)
      --v=2
      --handle-volume-inuse-error=false
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
   liveness-probe:
    Image:      k8s.gcr.io/sig-storage/livenessprobe:v2.5.0
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=/csi/csi.sock
    Environment:  <none>
    Mounts:
      /csi from socket-dir (rw)
  Volumes:
   socket-dir:
    Type:               EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:          <unset>
  Priority Class Name:  system-cluster-critical
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   ebs-csi-controller-7f7d8649dd (2/2 replicas created)
Events:          <none>
  • kubectl apply -f manifests
persistentvolumeclaim/ebs-claim created
pod/app created
storageclass.storage.k8s.io/ebs-sc created
  • kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM               STORAGECLASS   REASON   AGE
pvc-59083709-395e-4afa-a9f4-d6489aee1149   4Gi        RWO            Delete           Bound    default/ebs-claim   ebs-sc                  15m
  • kubectl get pvc
NAME        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
ebs-claim   Bound    pvc-59083709-395e-4afa-a9f4-d6489aee1149   4Gi        RWO            ebs-sc         16m
  • kubectl get volumeattachment
NAME                                                                   ATTACHER          PV                                         NODE                                         ATTACHED   AGE
csi-a43fc22116483e48edf11653e570cd0809e5af646e284f4fed5e866f20227d22   ebs.csi.aws.com   pvc-59083709-395e-4afa-a9f4-d6489aee1149   ip-172-20-47-48.us-east-2.compute.internal   true       16m
  • kubectl delete -f manifests
persistentvolumeclaim "ebs-claim" deleted
pod "app" deleted
storageclass.storage.k8s.io "ebs-sc" deleted
  • kubectl logs -n kube-system $(kubectl get lease -n kube-system ebs-csi-aws-com -o=jsonpath="{.spec.holderIdentity}") -c csi-provisioner
W0712 19:26:25.429909       1 feature_gate.go:237] Setting GA feature gate Topology=true. It will be removed in a future release.
I0712 19:26:25.430059       1 feature_gate.go:245] feature gates: &{map[Topology:true]}
I0712 19:26:25.430078       1 csi-provisioner.go:139] Version: v3.1.0
I0712 19:26:25.430084       1 csi-provisioner.go:162] Building kube configs for running in cluster...
I0712 19:26:25.432523       1 common.go:111] Probing CSI driver for readiness
I0712 19:26:25.434924       1 csi-provisioner.go:206] Detected CSI driver ebs.csi.aws.com
I0712 19:26:25.434944       1 csi-provisioner.go:216] Supports migration from in-tree plugin: kubernetes.io/aws-ebs
I0712 19:26:25.435808       1 common.go:111] Probing CSI driver for readiness
I0712 19:26:25.439656       1 csi-provisioner.go:275] CSI driver supports PUBLISH_UNPUBLISH_VOLUME, watching VolumeAttachments
I0712 19:26:25.441043       1 controller.go:732] Using saving PVs to API server in background
I0712 19:26:25.443283       1 leaderelection.go:248] attempting to acquire leader lease kube-system/ebs-csi-aws-com...
I0712 19:26:25.456948       1 leaderelection.go:258] successfully acquired lease kube-system/ebs-csi-aws-com
I0712 19:26:25.458888       1 leader_election.go:205] became leader, starting
I0712 19:26:25.559678       1 controller.go:811] Starting provisioner controller ebs.csi.aws.com_ebs-csi-controller-7f7d8649dd-4cjrp_7172a4bb-96a0-4147-a171-0070ced7d485!
I0712 19:26:25.559732       1 volume_store.go:97] Starting save volume queue
I0712 19:26:25.660739       1 controller.go:860] Started provisioner controller ebs.csi.aws.com_ebs-csi-controller-7f7d8649dd-4cjrp_7172a4bb-96a0-4147-a171-0070ced7d485!
I0712 19:28:01.775333       1 controller.go:1337] provision "default/ebs-claim" class "ebs-sc": started
I0712 20:14:36.803689       1 controller.go:1337] provision "default/ebs-claim" class "ebs-sc": started
I0712 20:14:36.804278       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"ebs-claim", UID:"59083709-395e-4afa-a9f4-d6489aee1149", APIVersion:"v1", ResourceVersion:"8668", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/ebs-claim"
I0712 20:14:40.306663       1 controller.go:858] successfully created PV pvc-59083709-395e-4afa-a9f4-d6489aee1149 for PVC ebs-claim and csi volume name vol-005fa8f076a819d23
I0712 20:14:40.306693       1 controller.go:1442] provision "default/ebs-claim" class "ebs-sc": volume "pvc-59083709-395e-4afa-a9f4-d6489aee1149" provisioned
I0712 20:14:40.306701       1 controller.go:1455] provision "default/ebs-claim" class "ebs-sc": succeeded
I0712 20:14:40.315319       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"ebs-claim", UID:"59083709-395e-4afa-a9f4-d6489aee1149", APIVersion:"v1", ResourceVersion:"8668", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-59083709-395e-4afa-a9f4-d6489aee1149
I0713 00:08:10.562673       1 controller.go:1471] delete "pvc-59083709-395e-4afa-a9f4-d6489aee1149": started
W0713 00:08:10.562876       1 controller.go:1192] failed to get storageclass: ebs-sc, proceeding to delete without secrets. storageclass.storage.k8s.io "ebs-sc" not found
E0713 00:08:10.562921       1 controller.go:1481] delete "pvc-59083709-395e-4afa-a9f4-d6489aee1149": volume deletion failed: persistentvolume pvc-59083709-395e-4afa-a9f4-d6489aee1149 is still attached to node ip-172-20-47-48.us-east-2.compute.internal
W0713 00:08:10.562956       1 controller.go:989] Retrying syncing volume "pvc-59083709-395e-4afa-a9f4-d6489aee1149", failure 0
E0713 00:08:10.562975       1 controller.go:1007] error syncing volume "pvc-59083709-395e-4afa-a9f4-d6489aee1149": persistentvolume pvc-59083709-395e-4afa-a9f4-d6489aee1149 is still attached to node ip-172-20-47-48.us-east-2.compute.internal
I0713 00:08:10.563100       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"pvc-59083709-395e-4afa-a9f4-d6489aee1149", UID:"f77a250c-c9eb-4512-b561-20cc1defb0f2", APIVersion:"v1", ResourceVersion:"46295", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' persistentvolume pvc-59083709-395e-4afa-a9f4-d6489aee1149 is still attached to node ip-172-20-47-48.us-east-2.compute.internal
I0713 00:08:11.563592       1 controller.go:1471] delete "pvc-59083709-395e-4afa-a9f4-d6489aee1149": started
W0713 00:08:11.563697       1 controller.go:1192] failed to get storageclass: ebs-sc, proceeding to delete without secrets. storageclass.storage.k8s.io "ebs-sc" not found
E0713 00:08:11.563738       1 controller.go:1481] delete "pvc-59083709-395e-4afa-a9f4-d6489aee1149": volume deletion failed: persistentvolume pvc-59083709-395e-4afa-a9f4-d6489aee1149 is still attached to node ip-172-20-47-48.us-east-2.compute.internal
W0713 00:08:11.563762       1 controller.go:989] Retrying syncing volume "pvc-59083709-395e-4afa-a9f4-d6489aee1149", failure 1
E0713 00:08:11.563777       1 controller.go:1007] error syncing volume "pvc-59083709-395e-4afa-a9f4-d6489aee1149": persistentvolume pvc-59083709-395e-4afa-a9f4-d6489aee1149 is still attached to node ip-172-20-47-48.us-east-2.compute.internal
I0713 00:08:11.563967       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"pvc-59083709-395e-4afa-a9f4-d6489aee1149", UID:"f77a250c-c9eb-4512-b561-20cc1defb0f2", APIVersion:"v1", ResourceVersion:"46295", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' persistentvolume pvc-59083709-395e-4afa-a9f4-d6489aee1149 is still attached to node ip-172-20-47-48.us-east-2.compute.internal
I0713 00:08:13.564511       1 controller.go:1471] delete "pvc-59083709-395e-4afa-a9f4-d6489aee1149": started
W0713 00:08:13.564578       1 controller.go:1192] failed to get storageclass: ebs-sc, proceeding to delete without secrets. storageclass.storage.k8s.io "ebs-sc" not found
E0713 00:08:13.564609       1 controller.go:1481] delete "pvc-59083709-395e-4afa-a9f4-d6489aee1149": volume deletion failed: persistentvolume pvc-59083709-395e-4afa-a9f4-d6489aee1149 is still attached to node ip-172-20-47-48.us-east-2.compute.internal
W0713 00:08:13.564639       1 controller.go:989] Retrying syncing volume "pvc-59083709-395e-4afa-a9f4-d6489aee1149", failure 2
E0713 00:08:13.564655       1 controller.go:1007] error syncing volume "pvc-59083709-395e-4afa-a9f4-d6489aee1149": persistentvolume pvc-59083709-395e-4afa-a9f4-d6489aee1149 is still attached to node ip-172-20-47-48.us-east-2.compute.internal
I0713 00:08:13.564877       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"pvc-59083709-395e-4afa-a9f4-d6489aee1149", UID:"f77a250c-c9eb-4512-b561-20cc1defb0f2", APIVersion:"v1", ResourceVersion:"46295", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' persistentvolume pvc-59083709-395e-4afa-a9f4-d6489aee1149 is still attached to node ip-172-20-47-48.us-east-2.compute.internal
I0713 00:08:17.564822       1 controller.go:1471] delete "pvc-59083709-395e-4afa-a9f4-d6489aee1149": started
W0713 00:08:17.564956       1 controller.go:1192] failed to get storageclass: ebs-sc, proceeding to delete without secrets. storageclass.storage.k8s.io "ebs-sc" not found
E0713 00:08:17.565035       1 controller.go:1481] delete "pvc-59083709-395e-4afa-a9f4-d6489aee1149": volume deletion failed: persistentvolume pvc-59083709-395e-4afa-a9f4-d6489aee1149 is still attached to node ip-172-20-47-48.us-east-2.compute.internal
W0713 00:08:17.565109       1 controller.go:989] Retrying syncing volume "pvc-59083709-395e-4afa-a9f4-d6489aee1149", failure 3
E0713 00:08:17.565168       1 controller.go:1007] error syncing volume "pvc-59083709-395e-4afa-a9f4-d6489aee1149": persistentvolume pvc-59083709-395e-4afa-a9f4-d6489aee1149 is still attached to node ip-172-20-47-48.us-east-2.compute.internal
I0713 00:08:17.565228       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"pvc-59083709-395e-4afa-a9f4-d6489aee1149", UID:"f77a250c-c9eb-4512-b561-20cc1defb0f2", APIVersion:"v1", ResourceVersion:"46295", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' persistentvolume pvc-59083709-395e-4afa-a9f4-d6489aee1149 is still attached to node ip-172-20-47-48.us-east-2.compute.internal
I0713 00:08:25.565987       1 controller.go:1471] delete "pvc-59083709-395e-4afa-a9f4-d6489aee1149": started
W0713 00:08:25.566076       1 controller.go:1192] failed to get storageclass: ebs-sc, proceeding to delete without secrets. storageclass.storage.k8s.io "ebs-sc" not found
I0713 00:08:25.694325       1 controller.go:1486] delete "pvc-59083709-395e-4afa-a9f4-d6489aee1149": volume deleted
I0713 00:08:25.702747       1 controller.go:1531] delete "pvc-59083709-395e-4afa-a9f4-d6489aee1149": persistentvolume deleted
I0713 00:08:25.702773       1 controller.go:1536] delete "pvc-59083709-395e-4afa-a9f4-d6489aee1149": succeeded

Confirmed that the volume was deleted in AWS.

torredil avatar Jul 13 '22 00:07 torredil

the CSI just panic as well log-aws-csi.log

shinji62 avatar Jul 13 '22 04:07 shinji62

Thanks for providing the logs. From looking at the stack trace, the csi-provisioner sidecar seems to crash due to an unhandled exception here when deleting the volume. I suggest reporting this issue in the external-provisioner repository.

torredil avatar Jul 14 '22 01:07 torredil

@torredil So I updated the k8s.gcr.io/sig-storage/csi-provisioner to v3.2.1 and now no more crashing, but I think I know why volumes are not deleted, I got a lot of

ebs-plugin W0720 08:54:37.255402       1 cloud.go:551] Ignoring error from describe volume for volume "vol-0d79b1d46dfecb41f"; will  │
│ retry: "RequestCanceled: request context canceled\ncaused by: context deadline exceeded"                                             │
│ ebs-plugin W0720 08:54:37.334582       1 cloud.go:551] Ignoring error from describe volume for volume "vol-03c72c14a36ab0660"; will  │
│ retry: "RequestCanceled: request context canceled\ncaused by: context deadline exceeded"                                             │
│ ebs-plugin W0720 08:54:37.445780       1 cloud.go:551] Ignoring error from describe volume for volume "vol-06a04d149cffc7891"; will  │
│ retry: "RequestCanceled: request context canceled\ncaused by: context deadline exceeded"                                             │
│ ebs-plugin W0720 08:54:40.450876       1 cloud.go:551] Ignoring error from describe volume for volume "vol-003d765830114deba"; will  │
│ retry: "RequestCanceled: request context canceled\ncaused by: context canceled"                                                      │
│ ebs-plugin W0720 08:54:43.978414       1 cloud.go:551] Ignoring error from describe volume for volume "vol-0b96dcb1662f14af2"; will  │
│ retry: "RequestCanceled: request context canceled\ncaused by: context canceled"                                                      │
│ ebs-plugin W0720 08:54:44.084632       1 cloud.go:551] Ignoring error from describe volume for volume "vol-09b06af7f49ecb3ee"; will  │
│ retry: "RequestCanceled: request context canceled\ncaused by: context canceled"

I think that because he could not get the volumes information then it;s still thinking that the volume is attached to the node for example for vol-0b96dcb1662f14af2

Name:              pvc-7b24f08d-b263-4786-851f-c3f305bd3716
Labels:            <none>
Annotations:       pv.kubernetes.io/provisioned-by: ebs.csi.aws.com
                   volume.kubernetes.io/provisioner-deletion-secret-name:
                   volume.kubernetes.io/provisioner-deletion-secret-namespace:
Finalizers:        [kubernetes.io/pv-protection external-attacher/ebs-csi-aws-com]
StorageClass:      gp3
Status:            Available
Claim:             
Reclaim Policy:    Delete
Access Modes:      RWO
VolumeMode:        Filesystem
Capacity:          187Gi
Node Affinity:     
  Required Terms:  
    Term 0:        topology.ebs.csi.aws.com/zone in [ap-northeast-1a[]
Message:           
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            ebs.csi.aws.com
    FSType:            ext4
    VolumeHandle:      vol-0b96dcb1662f14af2
    ReadOnly:          false
    VolumeAttributes:      storage.kubernetes.io/csiProvisionerIdentity=1658305777549-8081-ebs.csi.aws.com
Events:
  Type     Reason              Age                From                                                                                      Message
  ----     ------              ----               ----                                                                                      -------
  Warning  VolumeFailedDelete  14m (x2 over 14m)  ebs.csi.aws.com_ebs-csi-controller-5bb57ddd6b-gcl4f_2f176fbb-618d-494b-90cf-ea9d9aefbfe3  persistentvolume pvc-7b24f08d-b263-4786-851f-c3f305bd3716 is still attached to node ip-10-19-168-209.ap-northeast-1.compute.internal

Plus others errors as

 csi-attacher I0720 08:41:50.213405       1 csi_handler.go:286] Failed to save detach error to "csi-2d37cf057e0c8dff8fa4a8825352750e0 │
│ 8aff292eea8f11afc4c70e23776816d": volumeattachments.storage.k8s.io "csi-2d37cf057e0c8dff8fa4a8825352750e08aff292eea8f11afc4c70e23776 │
│ 816d" not found                                                                                                                      │
│ csi-attacher I0720 08:41:50.213426       1 csi_handler.go:231] Error processing "csi-2d37cf057e0c8dff8fa4a8825352750e08aff292eea8f11 │
│ afc4c70e23776816d": failed to detach: could not mark as detached: volumeattachments.storage.k8s.io "csi-2d37cf057e0c8dff8fa4a8825352 │
│ 750e08aff292eea8f11afc4c70e23776816d" not found

shinji62 avatar Jul 20 '22 08:07 shinji62

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 18 '22 09:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Nov 17 '22 09:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Dec 17 '22 10:12 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Dec 17 '22 10:12 k8s-ci-robot

@shinji62 I'm hitting the same error, have you figured out a solution/workaround?

woehrl01 avatar Mar 23 '23 15:03 woehrl01

Not so ideal solution but very safe to execute:

  1. Drain the kubenetes pods from the node reporting the problem with AWS EBS disk.
  2. Mark the node as unschedulable.
  3. Reboot the node. Better is to shut it down from OS
  4. Detach the disk manually from the AWS Console.
  5. Boot up the node.
  6. If the EBS CSI controller has still not deleted the PV, attempt to delete it.
  7. Mark the node schedulable.

amej avatar Jul 14 '23 06:07 amej

Hey @woehrl01 @shinji62 folks you are probably missing some Actions on the IAM policy for your nodes; I had to add the following to get the creation + deletion to work

ec2:CreateVolume // creation
ec2:CreateTags // creation
ec2:AttachVolume // creation
ec2:DetachVolume // deletion
ec2:DeleteVolume // deletion

You are probably missing the last two actions.

h4ck3rk3y avatar Dec 14 '23 07:12 h4ck3rk3y

@h4ck3rk3y Thanks, I just verified, that the permissions are correct. At least I use the one from the eks terraform blueprint:

https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/8a06a6e7006e4bed5630bd49c7434d76c59e0b5e/modules/kubernetes-addons/aws-ebs-csi-driver/data.tf#L96-L133

woehrl01 avatar Dec 14 '23 08:12 woehrl01