parodus
parodus copied to clipboard
Use exponential backoff as seed vs. absolute time
At large scales, we see large spikes at the exponential back-off retry times. Ideally, we'd use the exponential back-off time calculated as a max seed to a random time generation.
Example:
n=5
2^n-1 = 2^5-1 = 31
-- here is the new part --
randomize from 0 to 31 inclusive. [0, 31]
-- end new part --
use the random value as the delay before retrying.
We talked about adding jitter to the backoff, which I think is a great idea. As I understand this specific proposal, on the first backoff, we'd delay a random interval from 0 .. 3 secs, on the second backoff, a random delay from 0..7 secs, on the third backoff, a random delay from 0..15 secs, etc, up to a max of 0..255 secs.
@schmidtw this talks about the max seed but not about the min seed. Is the min seed supposed to change to previous n every retry as the max seed increases. For example if n varies from 2 to 8 i.e. 3s to 255s Should the random time be [3, 255] all the time or should it be like [3, 7] [7, 15] [15, 31] [31, 63] [63, 255] If [3, 255] is used then the subsequent retries does not guarantee increased sleep it may be 240s, 30s, 120s etc.
I think it is reasonable to leve the min time the same, so:
[3,7]
[3,15]
[3,31]
[3,63]
[3,255]
The large number of devices should be a Gaussian curve roughly centered in the middle of the range. Each time the range increases we push the middle of the curve out a bit, but don't force blackout windows. This should be a good pattern at scale.