sidekiq-unique-jobs icon indicating copy to clipboard operation
sidekiq-unique-jobs copied to clipboard

PRIMED key grows indefinitely when job re-queues itself in `after_unlock` callback

Open ttstarck opened this issue 1 year ago • 1 comments

Describe the bug PRIMED key grows indefinitely for jobs that requeue themselves in after_unlock callback if the job is completed and re-queued again within 1 second of when the initial job was queued.

Expected behavior PRIMED key is removed or stays at a small length when job is locked or unlocked.

Current behavior PRIMED key grows indefinitely, with "1" as a value when job is re-queued within 1 second of initially being queued.

Worker class

class AfterUnlockWorker
  include Sidekiq::Job
  sidekiq_options lock: :until_executed

  def perform(args)
    @args = args
  end

  def after_unlock
    self.class.perform_async(@args)
  end
end

Additional context Sidekiq Version: 6.5.8 SidekiqUniqueJobs Version: 7.1.29

Rspec Example:

# frozen_string_literal: true

require "rspec"
require "sidekiq"
require "sidekiq-unique-jobs"
require "sidekiq_unique_jobs/testing"

Sidekiq.configure_client do |config|
  config.client_middleware do |chain|
    chain.add(SidekiqUniqueJobs::Middleware::Client)
  end
end

RSpec.configure do |config|
  config.before(:each) do
    Sidekiq::Worker.clear_all
    Sidekiq::Testing.server_middleware do |chain|
      chain.add(SidekiqUniqueJobs::Middleware::Server)
    end
  end
end

RSpec.describe "PRIMED key spec" do
  it "grows PRIMED key indefinitely" do
    Sidekiq::Testing.fake!
    AfterUnlockWorker.perform_async(1)
    lock_digest = AfterUnlockWorker.jobs.first["lock_digest"]
    primed_key_length = -> { Sidekiq.redis { |r| r.llen("#{lock_digest}:PRIMED") } }
    primed_key_values = -> { Sidekiq.redis { |r| r.lrange("#{lock_digest}:PRIMED", 0, -1) }.uniq }
    100.times do |i|
      AfterUnlockWorker.perform_one
      puts "Test #{i} - Length of PRIMED key #{primed_key_length.call}" if i % 10 == 0
    end
    puts "Test Finished - Length of PRIMED key #{primed_key_length.call}"
    puts "Test Finished - Unique PRIMED key values #{primed_key_values.call}"
    sleep(1)
    puts "Test Finished - Length of PRIMED key after 1 second sleep #{primed_key_length.call}"
  end
end

Output from running the above spec:

Test 0 - Length of PRIMED key 1
Test 10 - Length of PRIMED key 11
Test 20 - Length of PRIMED key 21
Test 30 - Length of PRIMED key 31
Test 40 - Length of PRIMED key 41
Test 50 - Length of PRIMED key 51
Test 60 - Length of PRIMED key 61
Test 70 - Length of PRIMED key 71
Test 80 - Length of PRIMED key 81
Test 90 - Length of PRIMED key 91
Test Finished - Length of PRIMED key 100
Test Finished - Unique PRIMED key values ["1"]
Test Finished - Length of PRIMED key after 1 second sleep 0

This looks to be because when unlock LUA script on a job, it pushes "1" to the QUEUED key: https://github.com/mhenrixon/sidekiq-unique-jobs/blob/main/lib/sidekiq_unique_jobs/lua/unlock.lua#L95-L96

I unfortunately don't have enough context around the locking/unlocking algorithm to know if this is actually necessary or not.

I will say that after doing some more reading, Redis suggests a much simpler algorithm for setting up Distributed locking for a single master redis instance (which I think most people are running).

I'd love to hear about the reasoning behind why the QUEUED and PRIMED keys were added.

ttstarck avatar May 30 '23 23:05 ttstarck

Yes, redis suggests a simpler algorithm, but I needed more, so I dreamed up my own algorithm.

People requested that they lock only during runtime, not during queuing, meaning sidekiq picks the job up, runs it through the server middleware, and the server constrains any duplicates from running simultaneously.

People also requested that the locks stay active as long as the job is in Sidekiq. If you want something simpler, I recommend https://github.com/rwojsznis/sidekiq-lock or sidekiq-pro if you have the money.

I don't think you should schedule a new job as part of the old one still running. Sidekiq needs to be done processing the current job when you push the next one. Preferably, you should do that outside of this "Redis transaction" or "unit of work."

The queued and primed keys are used to block as part of processing the job. The push of "1" is accompanied by an expiration to ensure that listeners on the queued key can keep listening.

There are things I can do to improve your situation, but you know you are doing an infinite recursion, right?

mhenrixon avatar Feb 07 '24 04:02 mhenrixon