Allow PVC of completed PipelineRun to be deleted
Feature request
For PVCs specified as part of a PipelineRun with volumeClaimTemplate:
- Once a PipelineRun has completed, remove the
pvc-protectionfinalizer from the PVC to allow it to be deleted by the user (similar to https://github.com/argoproj/argo-workflows/issues/6629) - Provide an option to automatically delete the PVC on completion of PipelineRun
Use case
If a PVC is specified as part of a PipelineRun with volumeClaimTemplate, the PVC lifecycle is the same as the PipelineRun. In the case where PipelineRuns are kept for longer periods this is wasteful of storage resources and some users may want to delete the associated PVC before the PipelineRun, perhaps to keep logs, or because the PipelineRun is managed by a gitops controller.
See #1784 for when this was first noticed after PVC protection had been implemented in Kubernetes
If I remember it correctly from implementing volumeClaimTemplate. The PVC cannot be deleted because the Pod has the volume mounted, so the Pod needs to be deleted first.
Hey @jlpettersson - a completed pod no longer has the PVC mounted - easily demonstrated as it doesn't prevent a RWO PVC from being mounted by another node. However the PVC protection feature prevents PVCs from being deleted if referenced by pods, even if in completed state.
The linked Argo workflows PR removes the PVC protection finalizer when it knows all the pods are completed and it's safe to do so
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.
/lifecycle stale
Send feedback to tektoncd/plumbing.
/remove-lifecycle stale Still an issue
/remove-lifecycle stale
Just came across this issue this morning where I didn't expect the PVC to be seen after the pipelineRun was completed.
I am trying to understand... what is the use of such PVCs? Aren't they unnecessary reserving resources? Shouldn't this be a bug rather than a feature?
I think, the pvc created with volumeClaimTemplate should be deleted once the pipelineRun is completed.
/cc @skaegi
Thanks @skaegi for the references: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#limitations
Deleting and/or scaling a StatefulSet down will not delete the volumes associated with the StatefulSet. This is done to ensure data safety, which is generally more valuable than an automatic purge of all related StatefulSet resources.
The statefulset is deleted when a pipelineRun is completed but the pvc created with volumeClaimTemplate is not cleaned up. We could at the least provide an option either cluster wide or per pipelineRun to clean up pvc so that the operators or the users can choose how they would like to configure their deployments.
@pritidesai An option to clean up PVCs per PipelineRun would be awesome! I suppose in some cases people might want the option to keep them around for debugging?
For various reasons I have to keep completed PipelineRuns around, and if I were using volumeClaimTemplates each would consume it's own PVC which would be wasteful and expensive (I'm building 50+ apps). As a workaround I have to use a shared RWX PVC as the workspace with a subPath :vomiting_face:
Related: https://github.com/tektoncd/pipeline/issues/5412
Hey @jimmyjones2 I have created a PoC here - https://github.com/tektoncd/pipeline/pull/6635 Please let me know if this works for you. Thanks! Once we finalize on the design, I will update the PR to include tests. The changes in the PR are tested locally.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.
/lifecycle stale
Send feedback to tektoncd/plumbing.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.
/lifecycle rotten
Send feedback to tektoncd/plumbing.
@QuanZhang-William Just tried v0.51.0 with coschedule: workspaces and was expecting the PVC to be deleted as per #6940 - however there seems to be a race condition - sometimes the PVC is deleted, but other times I was left with a PVC in Terminating state as the kubernetes.io/pvc-protection wasn't removed
Hi @jimmyjones2, https://github.com/tektoncd/pipeline/pull/6940 only supports purging PVC with coschedule:pipelineruns mode. The change is not applied to coschedule:workspaces mode due to backward compatibility concerns.
You can see more detail in this discussion
Sorry @QuanZhang-William I wrote that incorrectly - when I tried coschedule: pipelineruns I sometimes ended up with terminating PVCs. With cochedule: workspaces I got the previous behavior of the PVC remaining after completion.
@jimmyjones2 . Thanks for reporting the bug! With some investigation, the flake is caused by the wrong order of purging finalizer and deleting PVCs.
It seems like finalizers cannot be patched/purged when the PVC is not in terminating status, so we need to delete the PVC first and then purge the finalizer.
I'm working on a PR to fix it.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.
/close
Send feedback to tektoncd/plumbing.
@tekton-robot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopenwith a justification. Mark the issue as fresh with/remove-lifecycle rottenwith a justification. If this issue should be exempted, mark the issue as frozen with/lifecycle frozenwith a justification./close
Send feedback to tektoncd/plumbing.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/reopen /remove-lifecycle rotten
@vdemeester: Reopened this issue.
In response to this:
/reopen /remove-lifecycle rotten
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.