argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

refactor: change the logic of delete pod during retry. Fixes: #12538

Open shuangkun opened this issue 4 months ago • 4 comments

Fixes: #12538 Refactor the logic of deleting pods during retry to speed up retry a workflow.

Motivation

Speed up retry a workflow. Let a large archived workflow (more than 8000 pods) can be successfully retryed within 1 minute.

Modifications

As Anton and Joibel suggested, I moved the delete logic to the controller. Add spec retry to workflow spec. Adding spec is just one way, other good ways are also possible. For example, pass label to trigger retry. If you think it's better to pass the label, I will change it.

Verification

e2e and units

reference: https://github.com/argoproj/argo-workflows/pull/12624 https://github.com/argoproj/argo-workflows/pull/12419

shuangkun avatar Mar 04 '24 09:03 shuangkun

Thank you for your reply. I would like to confirm a question. Do you think it is reasonable to pass the retry parameter through the spec? This will increase the amount of code changes. But at that time, I thought it was not appropriate to just use label to pass it, so I put it in the spec with reference to suspend.

shuangkun avatar Mar 25 '24 14:03 shuangkun

Thank you for your reply. I would like to confirm a question. Do you think it is reasonable to pass the retry parameter through the spec? This will increase the amount of code changes. But at that time, I thought it was not appropriate to just use label to pass it, so I put it in the spec with reference to suspend.

Here's some context spec and status annotation label

Both annotation and spec make sense in their own way.

  • annotation Directives from the end-user to the implementations to modify behavior or engage non-standard features.
  • The spec is a complete description of the desired state, including configuration settings provided by the user

I would vote for spec thinking in a way that, when we retry, we update the desired state of the workflow, and then the controller works on it towards the desired state

Hi @agilgur5, what are your thoughts on this?

tczhao avatar Mar 25 '24 14:03 tczhao

Hi @agilgur5 , can you give me your vote so that I can better solve these comments? Thanks.

shuangkun avatar Mar 29 '24 03:03 shuangkun

@agilgur5 Thank you for your reply. I have modified a version. Can you help me take a look?

shuangkun avatar Apr 10 '24 12:04 shuangkun