redis-semaphore icon indicating copy to clipboard operation
redis-semaphore copied to clipboard

release_stale_locks! does not always release

Open ChristofferJoergensen opened this issue 5 years ago • 0 comments

I am using gem version 0.3.1.

For some reason, I occasionally can't release a stale lock. I have described my debugging process below and would like to know if anyone can suggest how to further debug.

Assume that we want to create a redis lock with this name:

name = "test"

We insert this variable in two different terminal windows. In the first, we run:

def lock_for_15_secs(name)
  job = Redis::Semaphore.new(name.to_sym, redis: NonBlockingRedis.new(), custom_blpop: true, :stale_client_timeout => 15)
  if job.lock(-1) == "0"
    puts "Locked and starting"
    sleep(15)
    puts "Now it's stale, try to release in another process"
    sleep(15)
    puts "Now trying to unlock"
    unlock = job.unlock
    puts unlock == false ? "Wuhuu, already unlocked" : "Hm, should have been unlocked by another process, but wasn't"
  end
end
lock_for_15_secs(name)

In the second we run:

def release_and_lock(name)
  job     = Redis::Semaphore.new(name.to_sym, redis: NonBlockingRedis.new(), custom_blpop: true, :stale_client_timeout => 15)
  release = job.release_stale_locks!
  count   = job.available_count
  puts "Release reponse is #{release.inspect} and available count is #{count}"
  if job.lock(-1) == "0"
    puts "Wuhuu, we can lock it"
    job.unlock
  else
    puts "Hmm, we can't lock it"
  end
end
release_and_lock(name)

This usually plays out as expected. For 15 seconds, the second terminal can't relase the lock, but when I run it again it releases. Below is the output from release_and_lock(name).

Before 15 seconds have passed:

irb(main):1:0> release_and_lock(name)
Release reponse is {"0"=>"1580292557.321834"} and available count is 0
Hmm, we can't lock it
=> nil

After 15 seconds have passed:

irb(main):2:0> release_and_lock(name)
Release reponse is {"0"=>"1580292557.321834"} and available count is 1
Wuhuu, we can lock it
=> 1
irb(main):3:0> release_and_lock(name)
Release reponse is {} and available count is 1
Wuhuu, we can lock it

But sometimes in production I see that a stale lock isn't released, so I try to run the release_and_lock(name) to diagnose. It returns:

irb(main):4:0> release_and_lock(name)
Release reponse is {} and available count is 0
Hmm, we can't lock it

And at this point my only option is to flush redis:

non_blocking_redis = NonBlockingRedis.new()
non_blocking_redis.flushall

P.s. My NonBlockingRedis inherits from Redis:

class NonBlockingRedis < Redis

  def initialize(options = {})
    if options.empty?
      options = {
        url: Rails.application.secrets.redis_url,
        db:  Rails.application.secrets.redis_sidekiq_db,
        driver: :hiredis,
        network_timeout: 5
      }
    end

    super(options)
  end

  def blpop(key, timeout, custom_blpop)
    if custom_blpop
      if timeout == -1
        result = lpop(key)
        return result if result.nil?
        return [key, result]
      else
        super(key, timeout)
      end
    else
       super
    end
  end

  def lock(timeout = 0)
    exists_or_create!
    release_stale_locks! if check_staleness?
    token_pair = @redis.blpop(available_key, timeout, @custom_blpop)
    return false if token_pair.nil?
    current_token = token_pair[1]
    @tokens.push(current_token)
    @redis.hset(grabbed_key, current_token, current_time.to_f)
    if block_given?
      begin
        yield current_token
      ensure
        signal(current_token)
      end
    end
    current_token
  end
  alias_method :wait, :lock
end

require 'non_blocking_redis'

ChristofferJoergensen avatar Jan 29 '20 10:01 ChristofferJoergensen