five-bells-ledger icon indicating copy to clipboard operation
five-bells-ledger copied to clipboard

Consider different forms of backoff

Open justmoon opened this issue 10 years ago • 4 comments

Currently, we use exponential backoff. But jittering backoff may be better.

It's unlikely to matter much for our use case, but it's also a pretty simple change.

justmoon avatar Dec 03 '15 23:12 justmoon

I don't understand what jitter backoff would buy us -- as I understand it, it's meant for cases involving resource contention.

sentientwaffle avatar Dec 04 '15 21:12 sentientwaffle

Couldn't there be contention around a PUT /notifications/:id endpoint? Let's say you're running an FX service and a large number of payments executes at once. (E.g. because they were all triggered by some common event.)

The sudden spike of /notifications crashes your backend database. However, the retries all happen at the same (exponentially increasing) intervals, so every time the notifications are retried, your database crashes again. End result is that a lot of notifications time out on each retry and it takes much longer than necessary to process the backlog.

Exponential backoff contention

Plus jitter isn't any more complex to implement than exponential backoff - it's just a different one-liner. So even if we consider the above scenario fairly remote, there doesn't seem to be much of a reason not to use jitter just in case. (For testing we can mock Math.random to get back determinism.)

justmoon avatar Dec 04 '15 21:12 justmoon

Ok, so reading that article again, the best performing algorithms were Full Jitter and Decorrelated Jitter. Either one would be fine I think. Full Jitter seems a tad more natural to me and easier to understand, so that's the one I propose that we use:

sleep = random_between(0, min(cap, base * 2 ** attempt))

In the context of our code:

let retries = notification.retry_count = (notification.retry_count || 0) + 1
let delay = Math.random() * Math.min(120, Math.pow(2, retries))
notification.retry_at = new Date(Date.now() + 1000 * delay)

It's literally just adding the Math.random() in the second line.

@sentientwaffle @naoitoi Agreed?

justmoon avatar Dec 31 '15 19:12 justmoon

Yes, I agree with this approach.

naoitoi avatar Jan 01 '16 06:01 naoitoi