[Core] Add jitter to retry policies
Motivation
In environments where multiple clients across multiple processes are sending requests, we want to avoid the thundering herd problem where all these clients are retrying simultaneously. A service could potentially be overwhelmed by synchronized waves of retry attempts.
Jitter can be used to add some randomness to the calculated backoff times to break the synchronization.
Modifications
- Added a configurable option
retry_jitter_factortoRetryPolicyandAsyncRetryPolicywith a default value of 0.2 (20% jitter).- For example, with a calculated backoff time of 6.4s, the retry delay will be 6.4 ± (6.4 × 0.2) = 6.4 ± 1.28 seconds (random value from 5.12s to 7.68s)
- The first retry is no longer immediate when using exponential backoff. Instead, the first retry will now respect the configured backoff time and jitter factor
Example retry delays:
With default values of 0.8 backoff factor and 0.2 jitter factor.
Before: 0s 1.6s, 3.2s, 6.4s, 12.8s, ... After: 0.8±0.16s, 1.6±0.32s, 3.2±0.64s, 6.4±1.28s, 12.8±2.56s, ...
Additional notes
- Python is the only language to use the immediate first retry heuristic. Is this still needed?
- .NET uses a similar jitter algorithm and also has a default jitter factor of 0.2 (reference)
- Java uses a similar jitter algorithm but uses a default jitter factor of 0.05 (reference)
To-do
Determine what the default jitter factor should be?
API Change Check
APIView identified API level changes in this PR and created the following API reviews
Hi @pvaneck. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.
Hi @pvaneck. Thank you for your contribution. Since there hasn't been recent engagement, we're going to close this out. Feel free to respond with a comment containing /reopen if you'd like to continue working on these changes. Please be sure to use the command to reopen or remove the no-recent-activity label; otherwise, this is likely to be closed again with the next cleanup pass.