redis-semaphore
redis-semaphore copied to clipboard
Redis Semaphore reaching bad state
Hi there,
We're having some troubles with Redis Semaphore of late, we're no longer able to acquire locks on hundreds of keys. Looking closer, it seems to be because there are only VERSION
and EXISTS
subkeys, AVAILABLE
and GRABBED
are nowhere to be seen:
irb(main):055:0> redis.keys("SEMAPHORE:search_index_lock:6938264*")
=> ["SEMAPHORE:search_index_lock:6938264:VERSION", "SEMAPHORE:search_index_lock:6938264:EXISTS"]
calling lock
will cause lpop
/ blpop
to come back empty handed and the whole thing fails.
irb(main):051:0> semaphore.lock(1) { puts "hello" }
=> false
This of course makes intuitive sense, since an existing semaphore should have a list of AVAILABLE
or GRABBED
tokens at all times.
Do you have any thoughts about how we might be getting to this state, or what can be done to resolve it? For now I'm thinking we'll roll with expiration, so that at least we get a reset after being stuck for a while.
UPDATE 2016-09-19:
This is still occurring after moving to an all new keyspace with an expiration
set. The keys I'm seeing don't have expiration on them, ttl returns -1
. When testing directly, it all appears to work, so I'm potentially looking at some sort of race condition here that is causing the available list never to be created, or not to be repopulated properly once the semaphore is unlocked.
Since the ttl is not set on either the version or exists keys, I expect that the error must be occurring very early during the semaphore creation process.
One thing I'm noticing is that there's a small window between popping an available key and adding it to the "grabbed" keys that could cause a semaphore to fail, but that doesn't seem to be the issue here, as the keys should have an expiration set at this point.
The other thing I'm noticing is that behaviour gets a little bit undefined when two entities try to create a semaphore at the same time. One will start creating while the other will start immediately consuming the semaphore, assuming that it exists and is completely set up. It's possible problems are occurring there.
UPDATE 2016-09-22:
Since a semaphore is overkill for my needs, I've opted for a somewhat simplistic mutex implementation instead. Problem solved, I guess.
We're seeing the same behaviour whereby every 2-3 days Sempahore is reaching a bad state. We locking on average around 5-6,000 times a day. Eventually there are only version
and exists
keys available.
We tried setting expiration: 2.minutes
but as @bentheax mentions above this seems to have little effect.
We have probably seen the same issue. No AVAILABLE key existed in Redis with the name we use and I was unable to get a lock. I solved it by manually pushing the key (rails console):
2.2.0 :100 > c = Redis.current # get redis client instance 2.2.0 :101 > c.get('SEMAPHORE:package_builder:AVAILABLE') => nil 2.2.0 :102 > c.rpush('SEMAPHORE:package_builder:AVAILABLE', 1) => 1
Then I was able get a lock for our named semaphore: 2.2.0 :001 > s = Redis::Semaphore.new(:package_builder) 2.2.0 :002 > s.available_count => 1 2.2.0 :003 > s.lock(5) {puts "hello"} hello => nil
Hope this helps someone! However this is not a permanent fix.
Is there any chance your Redis DB ever reached the maximum memory size, causing it to start to evict non-persisted keys?
I can't rule out the possibility, it was so long ago that I no longer have access to that data.