redis-semaphore icon indicating copy to clipboard operation
redis-semaphore copied to clipboard

Redis Semaphore reaching bad state

Open ben-axnick opened this issue 8 years ago • 4 comments

Hi there,

We're having some troubles with Redis Semaphore of late, we're no longer able to acquire locks on hundreds of keys. Looking closer, it seems to be because there are only VERSION and EXISTS subkeys, AVAILABLE and GRABBED are nowhere to be seen:

irb(main):055:0> redis.keys("SEMAPHORE:search_index_lock:6938264*")
=> ["SEMAPHORE:search_index_lock:6938264:VERSION", "SEMAPHORE:search_index_lock:6938264:EXISTS"]

calling lock will cause lpop / blpop to come back empty handed and the whole thing fails.

irb(main):051:0> semaphore.lock(1) { puts "hello" }
=> false

This of course makes intuitive sense, since an existing semaphore should have a list of AVAILABLE or GRABBED tokens at all times.

Do you have any thoughts about how we might be getting to this state, or what can be done to resolve it? For now I'm thinking we'll roll with expiration, so that at least we get a reset after being stuck for a while.

UPDATE 2016-09-19:

This is still occurring after moving to an all new keyspace with an expiration set. The keys I'm seeing don't have expiration on them, ttl returns -1. When testing directly, it all appears to work, so I'm potentially looking at some sort of race condition here that is causing the available list never to be created, or not to be repopulated properly once the semaphore is unlocked.

Since the ttl is not set on either the version or exists keys, I expect that the error must be occurring very early during the semaphore creation process.

One thing I'm noticing is that there's a small window between popping an available key and adding it to the "grabbed" keys that could cause a semaphore to fail, but that doesn't seem to be the issue here, as the keys should have an expiration set at this point.

The other thing I'm noticing is that behaviour gets a little bit undefined when two entities try to create a semaphore at the same time. One will start creating while the other will start immediately consuming the semaphore, assuming that it exists and is completely set up. It's possible problems are occurring there.

UPDATE 2016-09-22:

Since a semaphore is overkill for my needs, I've opted for a somewhat simplistic mutex implementation instead. Problem solved, I guess.

ben-axnick avatar Sep 16 '16 08:09 ben-axnick

We're seeing the same behaviour whereby every 2-3 days Sempahore is reaching a bad state. We locking on average around 5-6,000 times a day. Eventually there are only version and exists keys available.

We tried setting expiration: 2.minutes but as @bentheax mentions above this seems to have little effect.

tbrammar avatar Nov 19 '16 05:11 tbrammar

We have probably seen the same issue. No AVAILABLE key existed in Redis with the name we use and I was unable to get a lock. I solved it by manually pushing the key (rails console):

2.2.0 :100 > c = Redis.current # get redis client instance 2.2.0 :101 > c.get('SEMAPHORE:package_builder:AVAILABLE') => nil 2.2.0 :102 > c.rpush('SEMAPHORE:package_builder:AVAILABLE', 1) => 1

Then I was able get a lock for our named semaphore: 2.2.0 :001 > s = Redis::Semaphore.new(:package_builder) 2.2.0 :002 > s.available_count => 1 2.2.0 :003 > s.lock(5) {puts "hello"} hello => nil

Hope this helps someone! However this is not a permanent fix.

thomasbalsloev avatar Aug 28 '17 14:08 thomasbalsloev

Is there any chance your Redis DB ever reached the maximum memory size, causing it to start to evict non-persisted keys?

dv avatar Sep 20 '17 15:09 dv

I can't rule out the possibility, it was so long ago that I no longer have access to that data.

ben-axnick avatar Sep 21 '17 00:09 ben-axnick