sig-storage-lib-external-provisioner icon indicating copy to clipboard operation
sig-storage-lib-external-provisioner copied to clipboard

Delete volume requests are sent twice with new sig-storage-lib-external-provisioner v10.0.1 because of the finalizer "external-provisioner.volume.kubernetes.io/finalizer"

Open zjuliuf opened this issue 1 year ago • 9 comments

What happened:

When we try to delete volume using new new sig-storage-lib-external-provisioner v10.0.1, we still notice 2 or more volume deletion requests in the logs. As a result of which PV deleting costs more time.

What you expected to happen:

Only 1 volume deletion request during the pv deletion.

Simple analysis:

When volume related to the PV was deleted successfully, sig-storage-lib-external-provisioner begins to Delete PV,then PV object was updated with a deletionTimestamp, so that the PV was added to volumeQueue, then exec deleteVolumeOperation again because the finalizer "external-provisioner.volume.kubernetes.io/finalizer" exists. finally, 2 or more volume deletion requests was sent during the pv deletion.

Below is the code analysis: (first time deletion) function 'shouldDelete' returns true ---> ctrl.provisioner.Delete deletes volume related to the PV ---> ctrl.client.CoreV1().PersistentVolumes().Delete deletes PV ---> PV was updated with adding deletionTimestamp ---> try to remove finalizer "external-provisioner.volume.kubernetes.io/finalizer"

(second time deletion) During the first time deletion, when PV was updated with adding deletionTimestamp(before removing finalizer), the PV was added to the volumeQueue again. Then the second time deletion began. Becuase the finalizer "external-provisioner.volume.kubernetes.io/finalizer" still existed and the PV status was Released, so the function 'shouldDelete' still returned 'true' and executed ctrl.provisioner.Delete again.

volumeHandler := cache.ResourceEventHandlerFuncs{
	AddFunc:    func(obj interface{}) { controller.enqueueVolume(obj) },
	UpdateFunc: func(oldObj, newObj interface{}) { controller.enqueueVolume(newObj) },
	DeleteFunc: func(obj interface{}) { controller.forgetVolume(obj) },
}

How to reproduce it:

Call NewProvisionController with an option AddFinalizer(true) Create a storageclass and PVC Delete PVC Notice the status of PV and logs of provisioner

Environment:

Driver version: Kubernetes version (use kubectl version): v1.28 OS (e.g. from /etc/os-release): Kernel (e.g. uname -a): Install tools: Others: sig-storage-lib-external-provisioner v10.0.1

zjuliuf avatar Aug 14 '24 10:08 zjuliuf

@zjuliuf you would need latest k8s release to verify the fix, such as 1.28.12.

Jainbrt avatar Aug 14 '24 11:08 Jainbrt

@Jainbrt Thanks a lot, I will try. By the way, which PR of k8s fixed this issue?

zjuliuf avatar Aug 14 '24 14:08 zjuliuf

https://github.com/kubernetes/kubernetes/pull/125767

Jainbrt avatar Aug 14 '24 14:08 Jainbrt

@Jainbrt I watched the status of the PV, its status was changed to 'Released' from 'Bound' without 'Failed'.

zjuliuf avatar Aug 14 '24 14:08 zjuliuf

is this the issue you are talking about https://github.com/kubernetes-csi/external-provisioner/issues/1184 ?

Jainbrt avatar Aug 14 '24 14:08 Jainbrt

@Jainbrt Looks like the same problem. I checked the code about 'syncVolume' in sig-storage-lib-external-provisioner v10.0.1 and watched the status of the PV. Because of the changing of PV object, sig-storage-lib-external-provisioner received multiple updating events, so that function 'syncVolume' executed multiple times, and then function 'shouldDelete' return 'true' because "external-provisioner.volume.kubernetes.io/finalizer" still exists in PV(this finalizer removed successfully finally) so that another volume deletion request was sent to the volume provisioner. Volume and PV can be deleted succefully finallly, only the deletion process will be executed more than 2 times(This doesn't happen every time and I guess because events of PV sometimes merge).

zjuliuf avatar Aug 15 '24 02:08 zjuliuf

In order to reduce redundant deletion processes when 'addFinalizer' is true, can we modify the code like below? In the codes below, If only the DeletionTimestamp and Finalizers of pv change, the pv will not be added to the queue. @jsafrane @xing-yang @Jainbrt

volumeHandler := cache.ResourceEventHandlerFuncs{
	AddFunc:    func(obj interface{}) { controller.enqueueVolume(obj) },
	UpdateFunc: func(oldObj, newObj interface{}) {
		if controller.addFinalizer {
			oldPv, ok := oldObj.(*v1.PersistentVolume)
			if !ok {
				return
			}
			newPv, ok := newObj.(*v1.PersistentVolume)
			if !ok {
				return
			}
			newPv.ResourceVersion = oldPv.ResourceVersion
			newPv.ManagedFields = oldPv.ManagedFields
			newPv.DeletionTimestamp = oldPv.DeletionTimestamp
			newPv.DeletionGracePeriodSeconds = oldPv.DeletionGracePeriodSeconds
			if len(newPv.Finalizers) <= len(oldPv.Finalizers) {
				newPv.Finalizers = oldPv.Finalizers
			}
			if equality.Semantic.DeepEqual(oldPv, newPv) {
				klog.V(2).Infof("debug, no changes except deletionTimestamp and finalizers")
				return
			}
		}
		controller.enqueueVolume(newObj)
	},
	DeleteFunc: func(obj interface{}) { controller.forgetVolume(obj) },
}

zjuliuf avatar Aug 15 '24 07:08 zjuliuf

same problem occurs in my production environment

Chaunceyctx avatar Aug 15 '24 08:08 Chaunceyctx

I checked the code about 'syncVolume' in sig-storage-lib-external-provisioner v10.0.1 and watched the status of the PV. Because of the changing of PV object, sig-storage-lib-external-provisioner received multiple updating events, so that function 'syncVolume' executed multiple times, and then function 'shouldDelete' return 'true' because "external-provisioner.volume.kubernetes.io/finalizer" still exists in PV

What events did the provisioner receive and who / what caused them? You can either get a diff in UpdateFunc in the provisioner or get it from the API server audit logs. It looks like something is racy here.

I would prefer to see what causes these events before we start ignoring them.

jsafrane avatar Aug 16 '24 09:08 jsafrane

Hi - I've been investigating this issue because we've also been seeing it in the EBS CSI Driver. Our driver is correctly idempotent, but these requets result in a lot of extra unnecessary API spam when deleting volumes - based on our testing this issue will occur on roughly one third of volume deletions.

This issue is called by the HonorPVReclaimPolicy feature (or, to be more specific to this repo, it is caused when the controller is initialized with AddFinalizer: https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/a6d1dc518aca508e2452724e86ca775003bcd23e/controller/controller.go#L565C6-L565C18

When the PV finalizer is enabled, we first removes the finalizer and then deletes the volume (after it successfully deletes a volume through the provisioner). That first change - removing the finalizer, triggers an additional update to the volume that goes through syncPVC, resulting in multiple DeleteVolume calls.

ConnorJC3 avatar Nov 07 '24 15:11 ConnorJC3

I checked the code about 'syncVolume' in sig-storage-lib-external-provisioner v10.0.1 and watched the status of the PV. Because of the changing of PV object, sig-storage-lib-external-provisioner received multiple updating events, so that function 'syncVolume' executed multiple times, and then function 'shouldDelete' return 'true' because "external-provisioner.volume.kubernetes.io/finalizer" still exists in PV

What events did the provisioner receive and who / what caused them? You can either get a diff in UpdateFunc in the provisioner or get it from the API server audit logs. It looks like something is racy here.

I would prefer to see what causes these events before we start ignoring them.

@jsafrane pv has 2 finalizers. image

  1. after successful deleting volume, the provisioner exec func ctrl.client.CoreV1().PersistentVolumes().Delete to delete pv. so generated the following event

{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"RequestResponse","auditID":"65a49e9a-f9b4-4bfb-856b-4f0700dfdea4","stage":"ResponseComplete","requestURI":"/api/v1/persistentvolumes/pvc-7109fd69-f0c9-41c5-812d-6b2f391b43e6","verb":"delete","user":{"username":"system:serviceaccount:kube-system:csiaddon-csi-controller","uid":"4f8bc8ab-2833-4660-ad30-258d6753a4b2","groups":["system:serviceaccounts","system:serviceaccounts:kube-system","system:authenticated"],"extra":{"authentication.kubernetes.io/pod-name":["csiaddon-csi-controller-57bbc79b79-6zkrp"],"authentication.kubernetes.io/pod-uid":["0222bfdd-d5e9-416b-b0c5-e7530712aa36"]}},"sourceIPs":["192.168.227.211"],"userAgent":"csiaddon-csi-controller/v0.0.0 (linux/amd64) kubernetes/$Format/csiaddon-csi-provisioner","objectRef":{"resource":"persistentvolumes","name":"pvc-7109fd69-f0c9-41c5-812d-6b2f391b43e6","apiVersion":"v1"},"responseStatus":{"metadata":{},"code":200},"requestObject":{"kind":"DeleteOptions","apiVersion":"meta.k8s.io/__internal"},"responseObject":{"kind":"PersistentVolume","apiVersion":"v1","metadata":{"name":"pvc-7109fd69-f0c9-41c5-812d-6b2f391b43e6","uid":"5d4cbfe5-7376-4823-b2d2-4cbcabdcf95a","resourceVersion":"1538004","creationTimestamp":"2024-11-13T12:26:16Z","deletionTimestamp":"2024-11-13T12:32:18Z","deletionGracePeriodSeconds":0,"labels":{"failure-domain.beta.kubernetes.io/region":"fake-az","failure-domain.beta.kubernetes.io/zone":"fake-region"},"annotations":{"pv.kubernetes.io/provisioned-by":"csiaddon-csi-provisioner"},"finalizers":["external-provisioner.volume.kubernetes.io/finalizer","kubernetes.io/pv-protection"],"managedFields":[{"manager":"csiaddon-csi-controller","operation":"Update","apiVersion":"v1","time":"2024-11-13T12:26:16Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:pv.kubernetes.io/provisioned-by":{}},"f:finalizers":{".":{},"v:"external-provisioner.volume.kubernetes.io/finalizer"":{}},"f:labels":{".":{},"f:failure-domain.beta.kubernetes.io/region":{},"f:failure-domain.beta.kubernetes.io/zone":{}}},"f:spec":{"f:accessModes":{},"f:capacity":{".":{},"f:storage":{}},"f:claimRef":{".":{},"f:apiVersion":{},"f:kind":{},"f:name":{},"f:namespace":{},"f:resourceVersion":{},"f:uid":{}},"f:csi":{".":{},"f:driver":{},"f:fsType":{},"f:volumeAttributes":{".":{},"f:csiaddon.io/disk-mode":{},"f:csiaddon.io/disk-volume-type":{},"f:csiaddon.io/enterprise-project-id":{},"f:storage.kubernetes.io/csiProvisionerIdentity":{}},"f:volumeHandle":{}},"f:nodeAffinity":{".":{},"f:required":{}},"f:persistentVolumeReclaimPolicy":{},"f:storageClassName":{},"f:volumeMode":{}}}},{"manager":"kube-controller-manager","operation":"Update","apiVersion":"v1","time":"2024-11-13T12:32:18Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{"f:phase":{}}},"subresource":"status"}]},"spec":{"capacity":{"storage":"10Gi"},"csi":{"driver":"disk.csi.csiaddon.io","volumeHandle":"1ec53b26-a8d6-4ab9-8e29-97d3f8172952","fsType":"ext4","volumeAttributes":{"csiaddon.io/disk-mode":"SCSI","csiaddon.io/disk-volume-type":"SAS","csiaddon.io/enterprise-project-id":"0","storage.kubernetes.io/csiProvisionerIdentity":"csiaddon-csi-provisioner"}},"accessModes":["ReadWriteOnce"],"claimRef":{"kind":"PersistentVolumeClaim","namespace":"default","name":"disk-18","uid":"7109fd69-f0c9-41c5-812d-6b2f391b43e6","apiVersion":"v1","resourceVersion":"1536629"},"persistentVolumeReclaimPolicy":"Delete","storageClassName":"csi-disk","volumeMode":"Filesystem","nodeAffinity":{"required":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"failure-domain.beta.kubernetes.io/zone","operator":"In","values":["fake-region"]}]}]}}},"status":{"phase":"Released"}},"requestReceivedTimestamp":"2024-11-13T12:32:18.597415Z","stageTimestamp":"2024-11-13T12:32:18.601982Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding "system:csiaddon-csi-controller" of ClusterRole "system:csiaddon-csi-controller" to ServiceAccount "csiaddon-csi-controller/kube-system""}}

  1. the provisioner began to remove finalizer "external-provisioner.volume.kubernetes.io/finalizer" of pv after successful deleting volume. so generated the following event

{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"RequestResponse","auditID":"6fdcc98d-c924-499b-911b-9b2113944a22","stage":"ResponseComplete","requestURI":"/api/v1/persistentvolumes/pvc-7109fd69-f0c9-41c5-812d-6b2f391b43e6","verb":"patch","user":{"username":"system:serviceaccount:kube-system:csiaddon-csi-controller","uid":"4f8bc8ab-2833-4660-ad30-258d6753a4b2","groups":["system:serviceaccounts","system:serviceaccounts:kube-system","system:authenticated"],"extra":{"authentication.kubernetes.io/pod-name":["csiaddon-csi-controller-57bbc79b79-6zkrp"],"authentication.kubernetes.io/pod-uid":["0222bfdd-d5e9-416b-b0c5-e7530712aa36"]}},"sourceIPs":["192.168.227.211"],"userAgent":"csiaddon-csi-controller/v0.0.0 (linux/amd64) kubernetes/$Format/csiaddon-csi-provisioner","objectRef":{"resource":"persistentvolumes","name":"pvc-7109fd69-f0c9-41c5-812d-6b2f391b43e6","apiVersion":"v1"},"responseStatus":{"metadata":{},"code":200},"requestObject":{"metadata":{"$deleteFromPrimitiveList/finalizers":["external-provisioner.volume.kubernetes.io/finalizer"],"$setElementOrder/finalizers":["kubernetes.io/pv-protection"]}},"responseObject":{"kind":"PersistentVolume","apiVersion":"v1","metadata":{"name":"pvc-7109fd69-f0c9-41c5-812d-6b2f391b43e6","uid":"5d4cbfe5-7376-4823-b2d2-4cbcabdcf95a","resourceVersion":"1538006","creationTimestamp":"2024-11-13T12:26:16Z","deletionTimestamp":"2024-11-13T12:32:18Z","deletionGracePeriodSeconds":0,"labels":{"failure-domain.beta.kubernetes.io/region":"fake-az","failure-domain.beta.kubernetes.io/zone":"fake-region"},"annotations":{"pv.kubernetes.io/provisioned-by":"csiaddon-csi-provisioner"},"finalizers":["kubernetes.io/pv-protection"],"managedFields":[{"manager":"csiaddon-csi-controller","operation":"Update","apiVersion":"v1","time":"2024-11-13T12:26:16Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:pv.kubernetes.io/provisioned-by":{}},"f:finalizers":{},"f:labels":{".":{},"f:failure-domain.beta.kubernetes.io/region":{},"f:failure-domain.beta.kubernetes.io/zone":{}}},"f:spec":{"f:accessModes":{},"f:capacity":{".":{},"f:storage":{}},"f:claimRef":{".":{},"f:apiVersion":{},"f:kind":{},"f:name":{},"f:namespace":{},"f:resourceVersion":{},"f:uid":{}},"f:csi":{".":{},"f:driver":{},"f:fsType":{},"f:volumeAttributes":{".":{},"f:csiaddon.io/disk-mode":{},"f:csiaddon.io/disk-volume-type":{},"f:csiaddon.io/enterprise-project-id":{},"f:storage.kubernetes.io/csiProvisionerIdentity":{}},"f:volumeHandle":{}},"f:nodeAffinity":{".":{},"f:required":{}},"f:persistentVolumeReclaimPolicy":{},"f:storageClassName":{},"f:volumeMode":{}}}},{"manager":"kube-controller-manager","operation":"Update","apiVersion":"v1","time":"2024-11-13T12:32:18Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{"f:phase":{}}},"subresource":"status"}]},"spec":{"capacity":{"storage":"10Gi"},"csi":{"driver":"disk.csi.csiaddon.io","volumeHandle":"1ec53b26-a8d6-4ab9-8e29-97d3f8172952","fsType":"ext4","volumeAttributes":{"csiaddon.io/disk-mode":"SCSI","csiaddon.io/disk-volume-type":"SAS","csiaddon.io/enterprise-project-id":"0","storage.kubernetes.io/csiProvisionerIdentity":"csiaddon-csi-provisioner"}},"accessModes":["ReadWriteOnce"],"claimRef":{"kind":"PersistentVolumeClaim","namespace":"default","name":"disk-18","uid":"7109fd69-f0c9-41c5-812d-6b2f391b43e6","apiVersion":"v1","resourceVersion":"1536629"},"persistentVolumeReclaimPolicy":"Delete","storageClassName":"csi-disk","volumeMode":"Filesystem","nodeAffinity":{"required":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"failure-domain.beta.kubernetes.io/zone","operator":"In","values":["fake-region"]}]}]}}},"status":{"phase":"Released"}},"requestReceivedTimestamp":"2024-11-13T12:32:18.603383Z","stageTimestamp":"2024-11-13T12:32:18.609903Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding "system:csiaddon-csi-controller" of ClusterRole "system:csiaddon-csi-controller" to ServiceAccount "csiaddon-csi-controller/kube-system""}}

  1. pv-protection-controller began to remove finalizer "kubernetes.io/pv-protection" of pv after pv received DELETE req. so generated the following event

{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"RequestResponse","auditID":"03f766a9-31b7-487b-ae98-2895f38e0611","stage":"ResponseComplete","requestURI":"/api/v1/persistentvolumes/pvc-7109fd69-f0c9-41c5-812d-6b2f391b43e6","verb":"update","user":{"username":"system:serviceaccount:kube-system:pv-protection-controller","uid":"5dde667f-551d-4530-859f-c0086439e06a","groups":["system:serviceaccounts","system:serviceaccounts:kube-system","system:authenticated"]},"sourceIPs":["192.168.99.16"],"userAgent":"kube-controller-manager/v1.28.15 (linux/amd64) kubernetes/79252b8/system:serviceaccount:kube-system:pv-protection-controller","objectRef":{"resource":"persistentvolumes","name":"pvc-7109fd69-f0c9-41c5-812d-6b2f391b43e6","uid":"5d4cbfe5-7376-4823-b2d2-4cbcabdcf95a","apiVersion":"v1","resourceVersion":"1538006"},"responseStatus":{"metadata":{},"code":200},"requestObject":{"kind":"PersistentVolume","apiVersion":"v1","metadata":{"name":"pvc-7109fd69-f0c9-41c5-812d-6b2f391b43e6","uid":"5d4cbfe5-7376-4823-b2d2-4cbcabdcf95a","resourceVersion":"1538006","creationTimestamp":"2024-11-13T12:26:16Z","deletionTimestamp":"2024-11-13T12:32:18Z","deletionGracePeriodSeconds":0,"labels":{"failure-domain.beta.kubernetes.io/region":"fake-az","failure-domain.beta.kubernetes.io/zone":"fake-region"},"annotations":{"pv.kubernetes.io/provisioned-by":"csiaddon-csi-provisioner"},"managedFields":[{"manager":"csiaddon-csi-controller","operation":"Update","apiVersion":"v1","time":"2024-11-13T12:26:16Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:pv.kubernetes.io/provisioned-by":{}},"f:finalizers":{},"f:labels":{".":{},"f:failure-domain.beta.kubernetes.io/region":{},"f:failure-domain.beta.kubernetes.io/zone":{}}},"f:spec":{"f:accessModes":{},"f:capacity":{".":{},"f:storage":{}},"f:claimRef":{".":{},"f:apiVersion":{},"f:kind":{},"f:name":{},"f:namespace":{},"f:resourceVersion":{},"f:uid":{}},"f:csi":{".":{},"f:driver":{},"f:fsType":{},"f:volumeAttributes":{".":{},"f:csiaddon.io/disk-mode":{},"f:csiaddon.io/disk-volume-type":{},"f:csiaddon.io/enterprise-project-id":{},"f:storage.kubernetes.io/csiProvisionerIdentity":{}},"f:volumeHandle":{}},"f:nodeAffinity":{".":{},"f:required":{}},"f:persistentVolumeReclaimPolicy":{},"f:storageClassName":{},"f:volumeMode":{}}}},{"manager":"kube-controller-manager","operation":"Update","apiVersion":"v1","time":"2024-11-13T12:32:18Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{"f:phase":{}}},"subresource":"status"}]},"spec":{"capacity":{"storage":"10Gi"},"csi":{"driver":"disk.csi.csiaddon.io","volumeHandle":"1ec53b26-a8d6-4ab9-8e29-97d3f8172952","fsType":"ext4","volumeAttributes":{"csiaddon.io/disk-mode":"SCSI","csiaddon.io/disk-volume-type":"SAS","csiaddon.io/enterprise-project-id":"0","storage.kubernetes.io/csiProvisionerIdentity":"csiaddon-csi-provisioner"}},"accessModes":["ReadWriteOnce"],"claimRef":{"kind":"PersistentVolumeClaim","namespace":"default","name":"disk-18","uid":"7109fd69-f0c9-41c5-812d-6b2f391b43e6","apiVersion":"v1","resourceVersion":"1536629"},"persistentVolumeReclaimPolicy":"Delete","storageClassName":"csi-disk","volumeMode":"Filesystem","nodeAffinity":{"required":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"failure-domain.beta.kubernetes.io/zone","operator":"In","values":["fake-region"]}]}]}}},"status":{"phase":"Released"}},"responseObject":{"kind":"PersistentVolume","apiVersion":"v1","metadata":{"name":"pvc-7109fd69-f0c9-41c5-812d-6b2f391b43e6","uid":"5d4cbfe5-7376-4823-b2d2-4cbcabdcf95a","resourceVersion":"1538006","creationTimestamp":"2024-11-13T12:26:16Z","deletionTimestamp":"2024-11-13T12:32:18Z","deletionGracePeriodSeconds":0,"labels":{"failure-domain.beta.kubernetes.io/region":"fake-az","failure-domain.beta.kubernetes.io/zone":"fake-region"},"annotations":{"pv.kubernetes.io/provisioned-by":"csiaddon-csi-provisioner"},"managedFields":[{"manager":"csiaddon-csi-controller","operation":"Update","apiVersion":"v1","time":"2024-11-13T12:26:16Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:pv.kubernetes.io/provisioned-by":{}},"f:labels":{".":{},"f:failure-domain.beta.kubernetes.io/region":{},"f:failure-domain.beta.kubernetes.io/zone":{}}},"f:spec":{"f:accessModes":{},"f:capacity":{".":{},"f:storage":{}},"f:claimRef":{".":{},"f:apiVersion":{},"f:kind":{},"f:name":{},"f:namespace":{},"f:resourceVersion":{},"f:uid":{}},"f:csi":{".":{},"f:driver":{},"f:fsType":{},"f:volumeAttributes":{".":{},"f:csiaddon.io/disk-mode":{},"f:csiaddon.io/disk-volume-type":{},"f:csiaddon.io/enterprise-project-id":{},"f:storage.kubernetes.io/csiProvisionerIdentity":{}},"f:volumeHandle":{}},"f:nodeAffinity":{".":{},"f:required":{}},"f:persistentVolumeReclaimPolicy":{},"f:storageClassName":{},"f:volumeMode":{}}}},{"manager":"kube-controller-manager","operation":"Update","apiVersion":"v1","time":"2024-11-13T12:32:18Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{"f:phase":{}}},"subresource":"status"}]},"spec":{"capacity":{"storage":"10Gi"},"csi":{"driver":"disk.csi.csiaddon.io","volumeHandle":"1ec53b26-a8d6-4ab9-8e29-97d3f8172952","fsType":"ext4","volumeAttributes":{"csiaddon.io/disk-mode":"SCSI","csiaddon.io/disk-volume-type":"SAS","csiaddon.io/enterprise-project-id":"0","storage.kubernetes.io/csiProvisionerIdentity":"csiaddon-csi-provisioner"}},"accessModes":["ReadWriteOnce"],"claimRef":{"kind":"PersistentVolumeClaim","namespace":"default","name":"disk-18","uid":"7109fd69-f0c9-41c5-812d-6b2f391b43e6","apiVersion":"v1","resourceVersion":"1536629"},"persistentVolumeReclaimPolicy":"Delete","storageClassName":"csi-disk","volumeMode":"Filesystem","nodeAffinity":{"required":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"failure-domain.beta.kubernetes.io/zone","operator":"In","values":["fake-region"]}]}]}}},"status":{"phase":"Released"}},"requestReceivedTimestamp":"2024-11-13T12:32:18.641532Z","stageTimestamp":"2024-11-13T12:32:18.645094Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding "system:controller:pv-protection-controller" of ClusterRole "system:controller:pv-protection-controller" to ServiceAccount "pv-protection-controller/kube-system""}}

Added some print logs to the provisioner as follows image

because of the events of pv, after the volume was successfully deleted, the function UpdateFunc was executed twice more

first time: new pv: {"metadata":{"name":"pvc-7109fd69-f0c9-41c5-812d-6b2f391b43e6","uid":"5d4cbfe5-7376-4823-b2d2-4cbcabdcf95a","resourceVersion":"1538004","creationTimestamp":"2024-11-13T12:26:16Z","deletionTimestamp":"2024-11-13T12:32:18Z","deletionGracePeriodSeconds":0,"labels":{"failure-domain.beta.kubernetes.io/region":"fake-region","failure-domain.beta.kubernetes.io/zone":"fake-az"},"annotations":{"pv.kubernetes.io/provisioned-by":"csiaddon-csi-provisioner"},"finalizers":["external-provisioner.volume.kubernetes.io/finalizer","kubernetes.io/pv-protection"],

===> this time caused the second time deleting volume because there was still finalizer "external-provisioner.volume.kubernetes.io/finalizer"

second time: new pv: {"metadata":{"name":"pvc-7109fd69-f0c9-41c5-812d-6b2f391b43e6","uid":"5d4cbfe5-7376-4823-b2d2-4cbcabdcf95a","resourceVersion":"1538006","creationTimestamp":"2024-11-13T12:26:16Z","deletionTimestamp":"2024-11-13T12:32:18Z","deletionGracePeriodSeconds":0,"labels":{"failure-domain.beta.kubernetes.io/region":"fake-region","failure-domain.beta.kubernetes.io/zone":"fake-az"},"annotations":{"pv.kubernetes.io/provisioned-by":"csiaddon-csi-provisioner"},"finalizers":["kubernetes.io/pv-protection"],

zjuliuf avatar Nov 13 '24 13:11 zjuliuf

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 11 '25 13:02 k8s-triage-robot

/remove-lifecycle stale

ElijahQuinones avatar Mar 12 '25 19:03 ElijahQuinones

Guess the reason may be because when call ctrl.client.CoreV1().PersistentVolumes().Delete to delete PV, , the finalizer kubernetes.io/pv-protection hasn't been removed in the line newVolume , but the finalizer kubernetes.io/pv-protection got removed before remove the finalizer external-provisioner.volume.kubernetes.io/finalizer in line https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/62e0dbeb815bdc7e22d904a621d86d99218f2ffc/controller/controller.go#L1594. Then he first time delete, reports error

I0325 09:32:01.231828       1 controller.go:1570] delete "pvc-34170186-687f-4d5a-a6cb-4dd7cf8cf660": failed to remove finalizer for persistentvolume: Operation cannot be fulfilled on persistentvolumes "pvc-34170186-687f-4d5a-a6cb-4dd7cf8cf660": the object has been modified; please apply your changes to the latest version and try again

Added a debug message and build the private image:

                        newVolume, ok := volumeObj.(*v1.PersistentVolume)
                        klog.Info(logOperation(operation, "epwiann debug -  newVolume"))
                        klog.Info(logOperation(operation, fmt.Sprintf("%+v", newVolume.ObjectMeta.Finalizers)))

                        if !ok {
                                return fmt.Errorf("expected volume but got %+v", volumeObj)
                        }
                        finalizers, modified := removeFinalizer(newVolume.ObjectMeta.Finalizers, finalizerPV)

                        // Only update the finalizers if we actually removed something
                        if modified {
                                newVolume.ObjectMeta.Finalizers = finalizers


                                latestPV, err := ctrl.client.CoreV1().PersistentVolumes().Get(ctx, volume.Name, metav1.GetOptions{})
                                klog.Info(logOperation(operation, "epwiann debug -  latestPV"))
                                klog.Info(logOperation(operation, fmt.Sprintf("%+v", latestPV.Finalizers)))
                                if _, err = ctrl.client.CoreV1().PersistentVolumes().Update(ctx, newVolume, metav1.UpdateOptions{}); err != nil {

The corresponding log like below, the newVolume has two finalizers [external-provisioner.volume.kubernetes.io/finalizer kubernetes.io/pv-protection] and latetPV has only finalizer [external-provisioner.volume.kubernetes.io/finalizer].

I0325 11:35:46.172837       1 controller.go:1509] delete "pvc-39f542f1-3e08-4bdf-badf-47443d693796": started
I0325 11:35:48.075995       1 controller.go:1524] delete "pvc-39f542f1-3e08-4bdf-badf-47443d693796": volume deleted
I0325 11:35:48.085084       1 controller.go:1550] delete "pvc-39f542f1-3e08-4bdf-badf-47443d693796": epwiann debug -  newVolume
I0325 11:35:48.088972       1 controller.go:1553] delete "pvc-39f542f1-3e08-4bdf-badf-47443d693796": [external-provisioner.volume.kubernetes.io/finalizer kubernetes.io/pv-protection]

I0325 11:35:48.093342       1 controller.go:1565] delete "pvc-39f542f1-3e08-4bdf-badf-47443d693796": epwiann debug - latestPV
I0325 11:35:48.093571       1 controller.go:1567] delete "pvc-39f542f1-3e08-4bdf-badf-47443d693796": 2025-03-25 11:35:48 +0000 UTC
I0325 11:35:48.093597       1 controller.go:1568] delete "pvc-39f542f1-3e08-4bdf-badf-47443d693796": [external-provisioner.volume.kubernetes.io/finalizer]
I0325 11:35:48.098936       1 controller.go:1573] delete "pvc-39f542f1-3e08-4bdf-badf-47443d693796": failed to remove finalizer for persistentvolume: Operation cannot be fulfilled on persistentvolumes "pvc-39f542f1-3e08-4bdf-badf-47443d693796": the object has been modified; please apply your changes to the latest version and try again

If just add a sleep after ctrl.client.CoreV1().PersistentVolumes().Delete to make sure the kubernetes.io/pv-protection is deleted.

        if err = ctrl.client.CoreV1().PersistentVolumes().Delete(ctx, volume.Name, metav1.DeleteOptions{}); err != nil {
                // Oops, could not delete the volume and therefore the controller will
                // try to delete the volume again on next update.
                klog.Info(logOperation(operation, "failed to delete persistentvolume: %v", err))
                return err
        }

        klog.Info(logOperation(operation, "epwiann sleep 1s"))
        time.Sleep(1 * time.Second)

Then there is only one single delete volume requests

celiawa123 avatar Mar 25 '25 10:03 celiawa123

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 25 '25 01:06 k8s-triage-robot

/remove-lifecycle stale

ConnorJC3 avatar Jul 01 '25 20:07 ConnorJC3

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Sep 29 '25 21:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Oct 29 '25 22:10 k8s-triage-robot