node-redlock icon indicating copy to clipboard operation
node-redlock copied to clipboard

Proposal: Exponential retry delay mode

Open meredian opened this issue 4 years ago • 5 comments

Original redlock mechanics plays perfectly well in theory, however I've found real-life application to suffer in failure modes.

Basic scenario is that we have few concurrent requests to acquire locks, when conflict occurs - in few retries iterations are completed, other locks get change to move on. Works well 👍

Problem: Nodes could die, including being killed by resource monitor (e.g. k8s), so lock is not release. So request is terminated, client retries request, we're trying to re-acquire same lock again - but it was lock and not released. Client starts looping until it exhausts all the retries. And that happen with all concurrent retries, which were in-flight while restarting node.

Nodes gets killed often, it's normal operation (but rare, yep), and we don't want to expose this problem to client. Solutions could be. Set retryCount & retryDelay >> lock time . But e.g. for LockTime: 10s, that should be 100 retries with 100 ms retry - when you have 100 RPS & node reboot that would be 1k extra RPS per Redis :)

What I'm thinking about is to allow exponential-growing delay.

  • At first you do first N retries fast (suggesting real concurrent access)
  • After you do M retries with increasing delay, suggesting "Lock issuer is dead, let's wait when it release".

Technically could be something like:

new Redlock([redis], {
    driftFactor: 0.01,
    retryCount:  30,
    retryDelay:  { // Backwards-compatible, polymorphic number or this object
        startDelay: 100,
        expAfterRetryCount: 10,
        expFactor: 2,
        maxDelay: 3000
    }
    retryJitter:  100,
});

Maybe it's not the best way to set delay (calculating total max delay is not obvious), but I believe principal go/no-go & interface details.

meredian avatar Oct 27 '20 10:10 meredian

Another solution could be auto-extension from next thread (e.g. you lock with 1 sec, by extend it as you go), so that lock is released fast when locker process dies. But I really would like to have both ways, since we still have high-concurrency spikes, where with a lot of concurrent requests still have to wait for quite long time in total.

meredian avatar Oct 27 '20 11:10 meredian

I would be glad to participate implementing both parts if you as an author see it possible 😄

meredian avatar Oct 27 '20 11:10 meredian

Hi there! My apologies on letting this sit for over a year! I'm finally cleaning some of this up. If you would be interested in adding a more sophisticated/flexible backoff here (or perhaps making retry logic a functional interface), I would be perfectly open to that.

mike-marcacci avatar Nov 26 '21 19:11 mike-marcacci

@mike-marcacci Thanks for responding. I've moved away from context where it was used, but I actually would be glad to contribute once I have time 🙂

I really liked your suggestion about function interface - that would be really nice to have ability to pass custom function to have ability for end users to configure matching strategy.

And we could also add few predefined static functions to the library - exponential, flat-then-exponential, anything else to come in mind. That way we could evade number of different different object-like interfaces & provide more flexibility.

How does it sound to you?

meredian avatar Dec 02 '21 08:12 meredian

Hi @meredian, that sounds fantastic to me!

mike-marcacci avatar Dec 03 '21 17:12 mike-marcacci