argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

Ability to override pod GC strategy on a per-template basis

Open oliverdain opened this issue 9 months ago • 1 comments

Summary

Currently the podGC policy is set globally for the entire workflow. Sometimes it's helpful to be able to override it for a single step.

Use Cases

We have a use case where we have to create a persistent volume using a resource template in one step of a workflow. That volume is big and backed by fast storage so it's expensive. Then we populate it with data. Finally we clone it into a ReadOnlyMany persistent volume so that we can run many long-running ML jobs that all use the same data. We do the clone bit because GCP doesn't let you change a volume from ReadWriteOnce to ReadOnlyMany and the fast storage types don't support ReadWriteMany. So, after we've created our read-only, sharable volume we no longer need the original, writable volume. And, that volume is holding many TB's of data so it's expensive. The ML jobs run for several days so we don't want to pay for the volume if we don't need it. So, I have a workflow step to delete the persistent volume when the read-only volume is ready. But, while that step does immediately change the status of the volume claim to Terminating it's still bound to the pod that populated it so if my global GC policy is something that causes that pod to stick around until the workflow is complete the volume never gets cleaned up. I'd love to be able to override the GC policy for just that one task.


Message from the maintainers:

Love this feature request? Give it a 👍. We prioritise the proposals with the most 👍.

oliverdain avatar May 22 '24 21:05 oliverdain