redis-mutex icon indicating copy to clipboard operation
redis-mutex copied to clipboard

known pattern flow

Open royaltm opened this issue 11 years ago • 30 comments

The pattern from the official SETNX doc used in @kenn's implementation is flawed. The implementation itself however is really nice and I like it. Nevertheless my bad habit of testing other people's ideas (especially before implementing them and using on production) pushed me to play a little bit with redis-mutex gem.

Unfortunately the setnx/getset pattern may lead to the "phantom starvation" (what I call it) of the two (or more) concurrent processes (or threads) when both try to enter the critical section at the wrong time.

Here's a simple stress test:

require 'redis-mutex'

REDIS_OPTS = {}

lck = '__STRESSTEST__.lck'
Redis::Classy.db = Redis.new(REDIS_OPTS)
rmutex = Redis::Mutex.new(lck, expire: 5, block:10000, sleep:0.01)
10000.times do |i|
  rmutex.with_lock do
    print "\r %6d" % i
    sleep rand/100
  end
end

Now try to spawn two terminal windows (or create split window in tmux) and run the above script on both of them. When you start the first one, you'll see counting. But in a short moment (or even at the beginning) when you run it the second time the counting stops periodically. The scripts will eventually reach 9999 (if you are patient enough) but mostly sleeping, waiting for lock to expire.

I might suppose @kenn is already aware of that but since there is no BUGS/LIMITATIONS section in README file I wrote this issue for the record.

I also think it is ours (the coders') responsibility to warn others about known issues and limitations especially when you are the keeper of the quite popular rubygem name. Noblesse Oblige (and popularity should too) in the open source crowd. :)

Here are the reconstructed steps behind the vulnerability:

For clarity there are three actors in the example (A, B, C) though C could also be A.

now = 1 A has locked the resource (with expire_at=2) A tries to unlock, at the same time B tries to lock (with expire_at=3) A: get == 2 ? YES B: setnx(3) ? NO A: del B: get > now ? NO (get is 0) C now tries to lock with (expire_at=4) C: setnx(4) ? YES -> C is the owner now B: getset(3) <= now ? NO at the end B is not the owner but has managed to alter the lock

therefore C won't unlock: C: get == 4 ? NO (get is 3) so the lock remains staled and all the hungry processes must wait until it expires

royaltm avatar Feb 25 '13 03:02 royaltm