argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

fix: Allow users to selectively retry specific failed nodes . Fixes #12543

Open mio4kon opened this issue 1 year ago • 2 comments

Fixes #12543

Motivation

Allow users to selectively retry specific failed nodes instead of retrying all failed nodes at once.

Modifications

Removed the restriction that required the simultaneous use of --node-field-selector and --restart-successful. Now, using --node-field-selector alone allows for individual retries of specific failed nodes, instead of retrying all failures.

Verification

image

--node-field-selector can be used independently. ./dist/argo retry fail-24ptx --node-field-selector name=fail-24ptx.BB -v

Regressively used in combination. ./dist/argo retry fail-mz9c4 --restart-successful --node-field-selector name=fail-mz9c4.A

mio4kon avatar Jan 19 '24 15:01 mio4kon

feat: [...]

Please re-title this PR as a fix:, since per https://github.com/argoproj/argo-workflows/issues/12543#issuecomment-1900910489 this very much seems like a bug and not intended behavior

agilgur5 avatar Jan 22 '24 07:01 agilgur5

feat: [...]

Please re-title this PR as a fix:, since per #12543 (comment) this very much seems like a bug and not intended behavior

done

mio4kon avatar Jan 22 '24 09:01 mio4kon

@agilgur5 hello,Will this MR be merged into the trunk in the future?

mio4kon avatar Feb 04 '24 04:02 mio4kon

:shipit:

tooptoop4 avatar Feb 17 '24 04:02 tooptoop4

Can you add two e2e tests please? One that is a dag and one that is not.

Don't stress about invoking the server if there isn't infrastructure for those kind of tests (although I suspect you should be able to use REST) already.

Just add those two tests with comments linking them to this issue and PR.

@isubasinghe add e2e tests : TestRetryWorkflowWithStepsWithSelectedFailNodes and TestRetryWorkflowWithDAGWithSelectedFailNodes, Please help to check it out workflow/util/util_test.go

mio4kon avatar Feb 17 '24 12:02 mio4kon

Hi, I review the logic and this modification may cause problems. The current logic of the main branch is to not retry the error node with successful child nodes. This modification will result in some errors

JasonChen86899 avatar Jun 04 '24 05:06 JasonChen86899