google-cloud-cpp
google-cloud-cpp copied to clipboard
Consider offering a full jitter backoff policy
Here is a write up on full jitter. Apparently the java clients do this.
With pure exponential backoff policy we might have backoff ranges like: req1: (.5, 1) req2: (1, 2) req3: (2, 4) ....
With full jitter, these backoff ranges would look like: req1: (0, 1) req2: (0, 2) req3: (0, 4) ....
(make up your own units)
For some services (storage comes to mind) a minimum back off is recommended.
@dbolduc is going to research some more and make a recommendation.
This is the code to run the simulation in the article: https://github.com/aws-samples/aws-arch-backoff-simulator/blob/master/src/backoff_simulator.py
I added our backoff algorithm:
34a35,39
> class ExpoBackoffCloudCpp(Backoff):
> def backoff(self, n):
> v = self.expo(n)
> return random.uniform(v/2, v)
>
And a MinJitter (i.e. full jitter, but it doesn't start at 0)
39a45,49
> class ExpoBackoffMinJitter(Backoff):
> def backoff(self, n):
> v = self.expo(n)
> return random.uniform(self.base, v)
>
Results:
To summarize (with these exact settings, in this exact model), min jitter is indistinguishable from full jitter. Our strategy is indistinguishable from the jitter strategies in terms of total calls, but takes more time to complete the work.
I will phone a friend before making a recommendation...
Seems like other client library languages do full jitter only. Given the supposed, slight performance benefit, I think we should implement min-jitter (which is like full jitter, but slightly more general).
I think we should modify the implementation of the existing ExponentialBackoffPolicy
, with minimal behavior changes for the current constructor.
I think our default backoff policy should use full jitter.
Background
Our API accepts three parameters:
- minimum delay
- maximum delay (Note that this is an overall maximum.)
- scaling factor
https://github.com/googleapis/google-cloud-cpp/blob/9737c4bfc27eea6c593dcbe2d59af41d266acc0c/google/cloud/internal/backoff_policy.h#L125-L128
Aside: I don't agree with the range it sets. I think we should multiply by scaling
, not 2
. Whatever. https://github.com/googleapis/google-cloud-cpp/blob/9737c4bfc27eea6c593dcbe2d59af41d266acc0c/google/cloud/internal/backoff_policy.h#L131
Min-jitter requires four parameters:
- minimum delay (0 in the case of full jitter)
- initial delay upper bound
- maximum delay
- scaling factor
Design / Work:
I would break up the work into two PRs:
1. Implement min-jitter
- add a new constructor + a
minimum_delay_
member.- I'd probably rename
s/current_range_delay_/current_range_upper_bound_/
, too.
- I'd probably rename
- map the parameters for the existing ctor to the members. (requires thinking!)
- update
clone()
andOnCompletion()
implementations. - add/update tests
- update documentation
2. Update library defaults
- tell the generator to use full-jitter: https://github.com/googleapis/google-cloud-cpp/blob/main/generator/internal/option_defaults_generator.cc#L156-L157
- run the generator
We want the RetryPolicyOption
to use a similar policy with a min of ms(0)
.
We want the value of the PollingPolicyOption
to be unchanged, so we should set it manually instead of using the value of the RetryPolicyOption
)
"Running the generator" means:
ci/cloudbuild/build.sh -t generate-libraries-pr
It may also be useful to generate only the "golden" files. (instead of 100 libraries). This just speeds up development cycles.
env GENERATE_GOLDEN_ONLY=1 ci/cloudbuild/build.sh -t generate-libraries-pr
Can we close this?
Yes