argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

fix: skip clear message when node transition from pending to fail. Fixes #13200

Open tczhao opened this issue 8 months ago • 4 comments

Fixes #13200

Motivation

Allow retry to use pod message when pod transition from pending to fail.

normally we have

timestamp0, status: pending, reason: ""
timestamp1, status: fail,    reason: "e.g. containerd issue"

but for podinitializing, the transition are the following

timestamp0,                    status: pending, reason: ""
timestamp1,                    status: pending, reason: "PodInitializing"
timestamp2(immediately after), status: fail,    reason: ""

Modifications

This PR fixes the issue, we don't overwrite message with "" when pod transition from pending to fail phase

timestamp0,                    status: pending, reason: ""
timestamp1,                    status: pending, reason: "PodInitializing"
timestamp2(immediately after), status: fail,    reason: "PodInitializing"

Verification

Add unit test, test failed Add changes, test succeed Also release to our production env for a week and pods able to retry on all PodInitializing message when configured in TRANSIENT_ERROR_PATTERN without seeing any other issue

tczhao avatar Jun 17 '24 14:06 tczhao