cloud-provider-openstack
cloud-provider-openstack copied to clipboard
[cinder-csi-plugin] Add finalizer to VolumeSnapshot when create PVC with the VolumeSnapshot
What this PR does / why we need it:
This PR implement a PVC watcher. When the watcher receive a PVC ADDED event and PVC with VolumeSnapshot kind dataSource, the VolumeSnapshot will be added a finalizer. And the finalizer will be removed when the watcher receive the PVC's DELETED event. It can used to protect VolumeSnapshot, avoid that the VS be deleted while there are any PersistentVolume on it.
Which issue this PR fixes(if applicable): fixes #2294
Special notes for reviewers:
Asynchronously add finalizer by watcher seems like unsafety, might result in some potential issues. After the PR [1] merged and released, we need to refactor it and implement a synchronous way to add finalizer.
[1] https://github.com/kubernetes-csi/external-provisioner/pull/1070
Release note:
Add a command flag `--enable-vs-deletion-protection`. If set, The VS will be added a finalizer while a PVC created base on it, ensure the VS couldn't be deleted before the PVC be deleted.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign jichenjc for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Hi @jeffyjf. Thanks for your PR.
I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Looks very WIP right now so probably not worth actually running tests, but
/ok-to-test
Note that tests won't run automatically while the PR is marked WIP, but you can run them manually on demand when you're ready with /test all.
Hi @mdbooth, thanks very much for your comments. But I have a bit question.
However, I don't think we can safely add the finalizer asynchronously. A finalizer would typically be added synchronously before resource creation. It's also worth noting that the resource in this case would be the PV rather than the PVC.
I am a newbie. In my opinion, this cannot be implemented just by watcher. We need to add a webhook to do this. right? If it's not right, do you have any other suggestions?
As mentioned in the issue, I do like the concept of adding a finalizer to the snapshot for each volume which requires it. However, I can't think of another instance where finalizers are used this way, so it might be worth some effort to discover if there are any problems with potentially very big lists of finalizers. Imagine, for example, a some CI system which uses a 'golden image' extremely heavily and has 1,000 volumes using it. Would that be ok?
However, I don't think we can safely add the finalizer asynchronously. A finalizer would typically be added synchronously before resource creation. It's also worth noting that the resource in this case would be the PV rather than the PVC.
Could we add code which adds and persists the snapshot finalizer immediately before creating the PV?
We would also need to add code to remove the finalizer in the deletion flow of the PV strictly before removing the PV finalizer.
In the past few days, I tried a few methods to implement add finalizer before creating the PV synchronously:
- I tried implement this in CreateVolume function, but I found I can't get enough context informations by graceful way. Such as, I can't obtain the k8s VolumeSnapshot's name and namespace, this result in I can't add finalizer to it.
- I tried add a webhook to intercept the PV's creation, this can add finalizer before creating the PV. But, this will increase the difficulty of deployment and maintenance. Moreover, If a cluster have multiple webhook about PV and another webhook reject the PV's creation after this webhook, I don't konw how to resolve this situation. So, I don't like this way.
I think the NO.1 method is more graceful and appropriate. But, the first methon require external-provisioner sidecar provide more necessary informations. So I think we should submit a KEP to CSI spec in order to we can get enough necessary informations about VolumeSnatshot in CreateVolume function to resolve this issue appropriately.
As mentioned in the issue, I do like the concept of adding a finalizer to the snapshot for each volume which requires it. However, I can't think of another instance where finalizers are used this way, so it might be worth some effort to discover if there are any problems with potentially very big lists of finalizers. Imagine, for example, a some CI system which uses a 'golden image' extremely heavily and has 1,000 volumes using it. Would that be ok? However, I don't think we can safely add the finalizer asynchronously. A finalizer would typically be added synchronously before resource creation. It's also worth noting that the resource in this case would be the PV rather than the PVC. Could we add code which adds and persists the snapshot finalizer immediately before creating the PV? We would also need to add code to remove the finalizer in the deletion flow of the PV strictly before removing the PV finalizer.
In the past few days, I tried a few methods to implement add finalizer before creating the PV synchronously:
- I tried implement this in CreateVolume function, but I found I can't get enough context informations by graceful way. Such as, I can't obtain the k8s VolumeSnapshot's name and namespace, this result in I can't add finalizer to it.
- I tried add a webhook to intercept the PV's creation, this can add finalizer before creating the PV. But, this will increase the difficulty of deployment and maintenance. Moreover, If a cluster have multiple webhook about PV and another webhook reject the PV's creation after this webhook, I don't konw how to resolve this situation. So, I don't like this way.
I think the NO.1 method is more graceful and appropriate. But, the first methon require external-provisioner sidecar provide more necessary informations. So I think we should submit a KEP to CSI spec in order to we can get enough necessary informations about VolumeSnatshot in CreateVolume function to resolve this issue appropriately.
I've already commit a PR to external-provisioner, in order to we can add finalizer to VolumeSnapshot synchronously in future. But now, let's continue to complete this imperfect solution to fix the issue #2294 and hold back resource leak in production environment asap.
Hi @mdbooth, thanks very much for your comments. But I have a bit question.
However, I don't think we can safely add the finalizer asynchronously. A finalizer would typically be added synchronously before resource creation. It's also worth noting that the resource in this case would be the PV rather than the PVC.
I am a newbie. In my opinion, this cannot be implemented just by watcher. We need to add a webhook to do this. right? If it's not right, do you have any other suggestions?
Hey, what is the problem with adding a finalizer asynchronously? We've did that all the time in Kuryr and it worked totally well when we made sure we're doing it by patching the resource properly.
/hold cancel
Hi CPO Co-Leads @dulek @jichenjc @kayrus @zetaab
This PR is ready for review. Please take a look at this PR in your spare time. I'm eager to get your advices.
Asynchronously add finalizer by watcher seems like unsafety, might result in some potential issues. After the PR [1] merged and released, we need to refactor it and implement a synchronous way to add finalizer.
thanks for the PR, I thought our PR rely on https://github.com/kubernetes-csi/external-provisioner/pull/1070 but read again seems you are proposing a long term solution in external-provisioner then after that's done we will do another round of fix to current one ? so this PR is a temp solution as stop gap ?
Asynchronously add finalizer by watcher seems like unsafety, might result in some potential issues. After the PR [1] merged and released, we need to refactor it and implement a synchronous way to add finalizer.
thanks for the PR, I thought our PR rely on kubernetes-csi/external-provisioner#1070 but read again seems you are proposing a long term solution in external-provisioner then after that's done we will do another round of fix to current one ? so this PR is a temp solution as stop gap ?
Yep, I am not sure when the kubernetes-csi/external-provisioner#1070 can be merged and released. This PR can be temporarily used to hold back resources leak.
Hi @mdbooth, thanks very much for your comments. But I have a bit question.
However, I don't think we can safely add the finalizer asynchronously. A finalizer would typically be added synchronously before resource creation. It's also worth noting that the resource in this case would be the PV rather than the PVC.
I am a newbie. In my opinion, this cannot be implemented just by watcher. We need to add a webhook to do this. right? If it's not right, do you have any other suggestions?
Hey, what is the problem with adding a finalizer asynchronously? We've did that all the time in Kuryr and it worked totally well when we made sure we're doing it by patching the resource properly.
There may just be some potential problems, but I have no concrete example. Maybe in an extremely busy environment, the resource may be deleted before the finalizer be successfully added.
PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@jeffyjf can you rebase on latest master?
@jeffyjf can you rebase on latest master?
Of course I will rebase it late on.
@jeffyjf: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| openstack-cloud-keystone-authentication-authorization-test | ee04af8ef62ebcb7c9f9f30321cfab09d22a91b6 | link | true | /test openstack-cloud-keystone-authentication-authorization-test |
| openstack-cloud-controller-manager-e2e-test | ee04af8ef62ebcb7c9f9f30321cfab09d22a91b6 | link | true | /test openstack-cloud-controller-manager-e2e-test |
| openstack-cloud-csi-manila-e2e-test | ee04af8ef62ebcb7c9f9f30321cfab09d22a91b6 | link | true | /test openstack-cloud-csi-manila-e2e-test |
| openstack-cloud-csi-manila-sanity-test | ee04af8ef62ebcb7c9f9f30321cfab09d22a91b6 | link | true | /test openstack-cloud-csi-manila-sanity-test |
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.
@jeffyjf can you fix unit tests?
@jeffyjf can you fix unit tests?
Of course, I'll fix it asap.
PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle rotten - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Reopen this PR with
/reopen - Mark this PR as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closed this PR.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closedYou can:
- Reopen this PR with
/reopen- Mark this PR as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.