Initial delay and initial jitter
When numerous executions may be performed around the same time, such as when a fleet of clients drop connections to an external server and reconnect, it may be desirable to have the initial connection attempts jittered or made random, not just the reconnection attempts after some failure. For that, we may want to have a policy that supports an initial delay and jitter.
One idea for this is we could have a separate DelayPolicy that could be used in addition to or even instead of the delay capabilities in RetryPolicy.
I'm curious what others think of this...
A separately configurable initial delay and jitter seems like overkill. How about DelayablePolicy.withInitialDelay(boolean) that when true adds its first delay before the first attempt? (When false, behaves normally.)
@Tembrel When you say a separately configurable delay and jitter seem like overkill, you mean if they were part of a separate policy or even as part of RetryPolicy?
Yes, these don't seem to pull their weight, neither
- a separate policy for an initial delay, nor
- separate methods on
RetryPolicyor a supertype to configure an initial delay distinct from subsequent delays.
It seems to me more natural and "lighter" to specify, in addition to how long to delay (i.e., fixed / backoff / jitter / function), whether you want the first delay to happen before or after the first attempt, with after being the default.
I agree that makes the API surface area a bit smaller. My only counter thought it that it somewhat misrepresents what a retry policy is - that it's about retrying after a failure occurs.
I hit on a use case at work recently where we wanted to jitter an initial delay of something without necessarily wanting to do retries. This seems to underscore how Delay could be worthy of a separate policy.
What got me thinking of this was a story today, where a real world thundering herd caused by smart thermostats could have benefited from some jitter: https://news.cornell.edu/stories/2022/07/smart-thermostats-inadvertently-strain-electric-power-grids
I see the applicability, but that use case is in the context of a more general periodic scheduling apparatus (transitions throughout the day, different patterns for different days of the week). Failsafe could extend into periodic scheduling, and I think there are open issues to add that kind of functionality, but unless you want to embrace that completely, it still seems nicer (simpler) to get this one effect as a special case of a retry policy.