workflow-basic-steps-plugin
workflow-basic-steps-plugin copied to clipboard
Added delay capability to retry
In some cases a user wants to add a delay between retries that are performed. While continue to support the original usage of retry, this adds three new attributes to RetryStep: useTimeDelay, timeDelay, and units. If useTimeDelay is true, then the timeDelay will be applied between retries.
@jglick @basil @dwnusbaum @abayer I figured it might be worth reaching out to a few of you with recent activity on the workflow-basic-steps-plugin. I have added the ability to introduce a delay in between retries and would like you to review this pull-request. Thank you
Hey @krotte1, unlike Jesse and Devin I am not (yet!) an official committer to this project, so I defer to them for any official feedback. My comments below are merely suggestions, not blocking issues.
The feature seems useful, and I've wanted it in my own pipelines several times. I haven't reviewed the implementation yet, but I took a look at the proposed API change to the retry
step and I have some ideas about that. Your proposed syntax is as follows:
retry(count: 3, timeDelay: 10, unit: 'SECONDS', useTimeDelay: true) {
[...]
}
Whenever I'm adding a new API, I try to study existing examples of similar APIs to learn from their mistakes and possibly gain insights that could benefit the design of my own API. In this case, I took a look at the API offered by Tenacity, which is (to me) the gold standard in retry APIs. I use it frequently in my Python-based build scripts.
Tenacity offers several options when it comes to waiting before retries. I've used several of them myself, depending on the use case. Some examples:
- Fixed period:
@retry(wait=wait_fixed(2))
- Randomness:
@retry(wait=wait_random(min=1, max=2))
- Exponential backoff:
@retry(wait=wait_exponential(multiplier=1, min=4, max=10))
- Fixed waits and jitter:
@retry(wait=wait_fixed(3) + wait_random(0, 2))
- Exponentially increasing jitter:
@retry(wait=wait_random_exponential(multiplier=1, max=60))
- Chain of backoffs:
@retry(wait=wait_chain(*[wait_fixed(3) for i in range(3)] + [wait_fixed(7) for i in range(2)] + [wait_fixed(9)]))
As you can see, Tenacity is the "Cadillac" of retry APIs. Am I suggesting that you implement all of this in your PR? I am emphatically not suggesting that.
What I am suggesting, though, is that you design for extension so that these additional features could be added later.
For example, it might be worth sketching out some possible syntax for the fancier forms of time-based retry (such as backoff and jitter) and making sure that those future improvements could be made without having to break compatibility with the syntax being proposed in this change. That way, we can move forward with what you're proposing here with more confidence that future extensions won't run into implementation issues.
What do you think?
Thanks @basil. I like where you are going with this and can see the usefulness of the implementation you suggested. Before I go through the effort of refactoring the code to make this type of implementation possible, I would like to know from @jglick @dwnusbaum @abayer would even allow this enhancement or the more extensible implementation you propose into the baseline.
I was really hoping to hear from one of the maintainers (@jglick @dwnusbaum @kohsuke @abayer) about the possibility of incorporating this time of feature in the baseline. I have started implementing what @basil talked about on a different branch and am almost finished.... I don't want to continue down this line if the simple case won't even be accepted.
I now have the extensible version implemented that was mentioned by @basil in a previous post. @jglick @dwnusbaum @kohsuke @abayer ... I was just wondering if you could tell me if either of these implementations would be accepted. The simple version in this pull request justs adds a fixed delay option to the retry step. The extensible version that I could merge into the pull-request allows for a FixedDelay, RandomDelay, IncrementalDelay, ExponentialDelay, and a RandomExponential delay. It can also be easily extended to add other delay algorithms.
I looked around and see that some pull requests have reviewers added to them, but I can't figure out how to do it myself. I beginning to think that only folks like @jglick @dwnusbaum @kohsuke @abayer that may have some type of elevated privileges can do it. All of the normal interactions with the Reviewers tab are not available to me so I can't really request someone to review this.
Indeed you need write permission to a repository to modify reviewers. There is a new “triage” permission on GitHub that we are proposing to roll out; see the Jenkins dev list for more.
@jglick - thank you for adding the reviewers to this pull request. I really appreciate it..
@basil - I went ahead and implemented the recommendations you made earlier in this pull request. Do you want me to go ahead and merge that into this pull request before you review it? I made the delay extensible and implemented many of the items you mentioned above: FixedDelay, RandomDelay, IncrementalDelay, ExponentialDelay, and a RandomExponential.
@krotte1 we don't currently have a great process for evaluating new features like this, but we are working on improving it. For now, can you create a Jira ticket, component workflow-basic-steps
, mention me in the description (@dnusbaum
), and we can discuss the idea/design there?
A new Jira ticket (https://issues.jenkins-ci.org/browse/JENKINS-59678) was created for this pull request to discuss the desired solution (simple fixed retry in the pull request or extensible solution that has already been implemented and can be added to this pull request).
@dwnusbaum - now that https://issues.jenkins-ci.org/browse/JENKINS-59678 has been out there for a week, what happens next? Which solution do we want to go with - simple fixed time delay between retries or an extensible method with different algorithms for retry delays?
@dwnusbaum - just wondering what the status is... I am really hoping to get one of these delay solutions incorporated into the baseline.
Awesome! Great work!
I just found this PR, as I actually had to create a few retry/sleep steps and came up with some internal shared library step that it works but I wish we could have this feature in place.
Any plans to move forward this PR?
Thanks
Is there anything holding up this PR from being merged?