argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

feat: handle interruptions as a transient error

Open rwong2888 opened this issue 5 months ago • 1 comments

Motivation

I have had this in my TRANSIENT_ERROR_PATTERN to handle automatically retrying pods that have been removed for one reason or another. This is to add it to the upstream argo-workflows. I feel it would be useful for users, especially those running workflows on preemptive or spot instances.

Modifications

Added below as a transient error. (OOMKilled|imminent node shutdown|node was low on resource|pod deleted|node is shutting down|Deadline exceeded)

Verification

Used the TRANSIENT_ERROR_PATTERN at Forbes.

Documentation

This file is already referenced in the documentation.

rwong2888 avatar Jun 05 '25 20:06 rwong2888

/retest

blkperl avatar Jun 06 '25 18:06 blkperl

/retest

rwong2888 avatar Jun 09 '25 13:06 rwong2888

/retest

rwong2888 avatar Jun 10 '25 18:06 rwong2888