go-resiliency icon indicating copy to clipboard operation
go-resiliency copied to clipboard

feat: add dynamic backoff

Open OrangeFlag opened this issue 11 months ago • 8 comments

Hi,
First of all, thank you for this excellent library! It has been incredibly useful in my projects.

I’m submitting this pull request to add a feature that enables dynamic backoff support. In some cases, the called service may return a Retry-After header and expects us to wait for the specified time before retrying. However, this header is not always present. Additionally, there are scenarios where I’d like to set a zero backoff for certain errors, allowing an immediate retry.

This pull request introduces the following modifications:

  • Added a new error constructor, ErrWithBackoff, to specify dynamic backoff durations.
  • Updated RunFn to handle ErrWithBackoff by dynamically setting the backoff duration based on the error field.
  • Added a test to verify the dynamic backoff logic.

I believe this feature will make the library more flexible and useful in various retry scenarios.
Thank you for considering this contribution!

OrangeFlag avatar Jan 17 '25 23:01 OrangeFlag

@OrangeFlag @eapache

Sorry to jump in but let me explain further. As a fan of go-resiliency's retry packages, I am curious to offer my opinion.

I think it's strange to assume that the use of Retry-After, etc. will be based on the existing mechanism for generating backoffs.

What I mean is that in cases where an error occurs and the Retry-After returns, it is unlikely that there will be a mix of cases where it returns and cases where it does not return, so I think the point is that Retry-After (like, dynamically generated values) should be clearly segregated as a compliant back-off. Also, the current backoffs are mismatched with the basic design of pre-generated values.

The current design you are proposing confuses the two and is quite implicit, making the existing backoff a design that “lies”.

The work functions that Run and RunCtx receive would be very difficult to modify in the current situation. So, why not have two functions, RunWithDynamicBackoff and RunCtxWithDynamicBackoff? In this case, we would rely on a “dynamic” backoff to be executed in the RunX function that would generate a backoff for each call.

I would like to solicit input from others on the name, but I think it is a reasonably significant change to have an inherently dynamic treatment enter into a static treatment.

I wouldn't be interrupting PR like this if I could discuss it first in an Issue, but I'm sorry if I'm stirring up the discussion.

tukeJonny avatar Feb 23 '25 02:02 tukeJonny

Thanks for your contribution! I've left some minor comments in the code.

It's been a while at this point since I've done much work in go, so I'm not as familiar with conventions around error-handling and typing anymore. Is it usual to define a new wrapping error class to handle cases like this? My first instinct would have been to define an interface for types that have a Backoff() time.Duration method instead, and let user errors implement that interface if they wish. But I don't know what actually ends up being easier, or what the current conventions are.

I also think that this current code is just screwing with errors and should ideally be in the form of defining a Backoff interface.

Errors should be used for handling as values, but using them as a substitute for messaging seems pretty forced.

The above my suggestions were made to avoid destructive changes. Also, if static and dynamic behaviors are mixed, it will be easy to introduce bugs and also difficult for users to understand, so why not prepare them in a way that they can be segregated? This is the reason why i suggested to create a separate form for the two.

tukeJonny avatar Feb 23 '25 02:02 tukeJonny

Hmm, to be honest it's the constructor (New) which takes the static list of backoff values. If you wanted to design a proper dynamic retry interface, it would involve a new constructor as well (either taking a function reference in the constructor, or leaving it up to the work function). But if you're baking more logic into the work function so it can return the backoff value, then does the classifier make sense either, or should that also just be a value the work function returns directly? Or should the classifier be responsible for calculating the dynamic backoff? Lots of questions / lots of options, maybe best designed as a fresh type.

@OrangeFlag I'm still open to some version of this feature, but I think it is a good point to consider the interface a little more carefully.

eapache avatar Feb 23 '25 20:02 eapache

@eapache

Thanks for the reply. It seems like a reasonable idea to me as well, that the backoff should be known at the constructor. The “processing” is passed by function reference (work) when processing, but the “how to define the backoff value” is something that the retry executor must know.

On the other hand, I recognize that the Classifier is there to determine success or failure and to determine the next action to take. In this PR proposal, we are talking about how to generate backoffs, assuming a situation where the client is not in a position to determine backoffs, so isn't this a different story? I believe that this is the case. Even if the work function were to return a backoff value, would it not change whether the next action taken is a retry or not, and would it not only take longer to wait?

I believe it is excessive for the Classifier to have the responsibility of generating backoffs. If we do that, users will always have to pay attention to the Classifier when they are concerned about handling backoffs or extending them, and we will have a situation where there are multiple concerns. The reverse is also true.

Against the scale, this PR seems a little too brave. How about we reopen the discussion with an Issue? In particular, what is needed in the design to separate constructors and how to incorporate them.

tukeJonny avatar Feb 23 '25 22:02 tukeJonny

Yes, if somebody is interested in working on it, then that larger discussion should be happening in a different issue or PR at this point - the ideas being floated here are much bigger than Ilya initially proposed.

eapache avatar Feb 23 '25 22:02 eapache

I'm going to leave this PR open for the time being - this is a reasonably small extension of the existing implementation that doesn't really get in the way of existing users. While an ideal dynamic retrier probably has a fairly different interface, I don't want to let the perfect be the enemy of the good; If somebody wants to tackle the bigger problem, that's great, but if they just want to address my existing more specific comments, I'm not opposed to merging this anyway.

eapache avatar Feb 23 '25 22:02 eapache

Hey everyone! I got here randomly while was checking if rate limiting module is there and read the entire discussion :)

Thx @eapache for this library, I'm currently using it.

My understanding on the topic, is that its not the responsibility of running function to decide on backoff value, in the same way as its not the responsibility of Retrier to calcSleep. So I would move all such logic into the interface like:

type Backoff interface {
    Duration(attempt int, err error) time.Duration
}

This approach would allow one to build their own backoff by chaining the existing one with custom error checking function. I could make a PR for that if you want guys.

wdyt?

yandzee avatar Mar 23 '25 07:03 yandzee

I think something like that interface could make sense, yes. As discussed though, switching from []time.Duration to something like that starts to impact the constructor and overall design of the type, which makes it not a very straight-forward addition, perhaps even warranting a whole new type/implementation. Happy to look at PRs if there is interest.

This PR is no longer the right place to be having this discussion though. Please file a ticket for any more discussions if needed.

eapache avatar Mar 23 '25 14:03 eapache