argo-workflows
argo-workflows copied to clipboard
fix: skip clear message when node transition from pending to fail. Fixes #13200
Fixes #13200
Motivation
Allow retry to use pod message when pod transition from pending to fail.
normally we have
timestamp0, status: pending, reason: ""
timestamp1, status: fail, reason: "e.g. containerd issue"
but for podinitializing, the transition are the following
timestamp0, status: pending, reason: ""
timestamp1, status: pending, reason: "PodInitializing"
timestamp2(immediately after), status: fail, reason: ""
Modifications
This PR fixes the issue, we don't overwrite message with "" when pod transition from pending to fail phase
timestamp0, status: pending, reason: ""
timestamp1, status: pending, reason: "PodInitializing"
timestamp2(immediately after), status: fail, reason: "PodInitializing"
Verification
Add unit test, test failed
Add changes, test succeed
Also release to our production env for a week and pods able to retry on all PodInitializing
message when configured in TRANSIENT_ERROR_PATTERN
without seeing any other issue