google-cloud-cpp icon indicating copy to clipboard operation
google-cloud-cpp copied to clipboard

Consider offering a full jitter backoff policy

Open dbolduc opened this issue 2 years ago • 3 comments

Here is a write up on full jitter. Apparently the java clients do this.

With pure exponential backoff policy we might have backoff ranges like: req1: (.5, 1) req2: (1, 2) req3: (2, 4) ....

With full jitter, these backoff ranges would look like: req1: (0, 1) req2: (0, 2) req3: (0, 4) ....

(make up your own units)

dbolduc avatar Apr 14 '22 20:04 dbolduc

For some services (storage comes to mind) a minimum back off is recommended.

coryan avatar Apr 15 '22 20:04 coryan

@dbolduc is going to research some more and make a recommendation.

coryan avatar Sep 08 '22 18:09 coryan

This is the code to run the simulation in the article: https://github.com/aws-samples/aws-arch-backoff-simulator/blob/master/src/backoff_simulator.py

I added our backoff algorithm:

34a35,39
> class ExpoBackoffCloudCpp(Backoff):
>     def backoff(self, n):
>         v = self.expo(n)
>         return random.uniform(v/2, v)
> 

And a MinJitter (i.e. full jitter, but it doesn't start at 0)

39a45,49
> class ExpoBackoffMinJitter(Backoff):
>     def backoff(self, n):
>         v = self.expo(n)
>         return random.uniform(self.base, v)
> 

Results: image

image

To summarize (with these exact settings, in this exact model), min jitter is indistinguishable from full jitter. Our strategy is indistinguishable from the jitter strategies in terms of total calls, but takes more time to complete the work.

I will phone a friend before making a recommendation...

dbolduc avatar Sep 12 '22 22:09 dbolduc

Seems like other client library languages do full jitter only. Given the supposed, slight performance benefit, I think we should implement min-jitter (which is like full jitter, but slightly more general).

I think we should modify the implementation of the existing ExponentialBackoffPolicy, with minimal behavior changes for the current constructor.

I think our default backoff policy should use full jitter.

Background

Our API accepts three parameters:

  • minimum delay
  • maximum delay (Note that this is an overall maximum.)
  • scaling factor

https://github.com/googleapis/google-cloud-cpp/blob/9737c4bfc27eea6c593dcbe2d59af41d266acc0c/google/cloud/internal/backoff_policy.h#L125-L128

Aside: I don't agree with the range it sets. I think we should multiply by scaling, not 2. Whatever. https://github.com/googleapis/google-cloud-cpp/blob/9737c4bfc27eea6c593dcbe2d59af41d266acc0c/google/cloud/internal/backoff_policy.h#L131

Min-jitter requires four parameters:

  • minimum delay (0 in the case of full jitter)
  • initial delay upper bound
  • maximum delay
  • scaling factor

Design / Work:

I would break up the work into two PRs:

1. Implement min-jitter

  • add a new constructor + a minimum_delay_ member.
    • I'd probably rename s/current_range_delay_/current_range_upper_bound_/, too.
  • map the parameters for the existing ctor to the members. (requires thinking!)
  • update clone() and OnCompletion() implementations.
  • add/update tests
  • update documentation

2. Update library defaults

  • tell the generator to use full-jitter: https://github.com/googleapis/google-cloud-cpp/blob/main/generator/internal/option_defaults_generator.cc#L156-L157
  • run the generator

We want the RetryPolicyOption to use a similar policy with a min of ms(0).

We want the value of the PollingPolicyOption to be unchanged, so we should set it manually instead of using the value of the RetryPolicyOption)

"Running the generator" means:

ci/cloudbuild/build.sh -t generate-libraries-pr

It may also be useful to generate only the "golden" files. (instead of 100 libraries). This just speeds up development cycles.

env GENERATE_GOLDEN_ONLY=1 ci/cloudbuild/build.sh -t generate-libraries-pr

dbolduc avatar Mar 22 '23 22:03 dbolduc

Can we close this?

coryan avatar Jun 09 '23 13:06 coryan

Yes

alevenberg avatar Jun 09 '23 13:06 alevenberg