sidekiq-unique-jobs icon indicating copy to clipboard operation
sidekiq-unique-jobs copied to clipboard

Jobs are being allowed to run even though there is a lock

Open daapower opened this issue 2 years ago • 8 comments

Describe the bug I'm not sure if I am doing something wrong. The unique jobs used to work but recently upgraded sidekiq and unique jobs.

Below is a job that is supposed to be locked until and while executing but as you can see there are 2 of them ["worker.1:3:9598347b346b", "14frb", {"queue"=>"default", "payload"=> {"retry"=>false, "queue"=>"default", "backtrace"=>true, "lock"=>"until_and_while_executing", "args"=>[], "class"=>"RunSponsoredAdsQueue", "jid"=>"8e11befa3b4600d7c4749941", "created_at"=>1661867359.8881626, "lock_timeout"=>nil, "lock_ttl"=>nil, "lock_prefix"=>"uniquejobs", "lock_args"=>[], "lock_digest"=>"uniquejobs:82157bcb53420755466db7f4590d54e6", "enqueued_at"=>1661867361.2786112}, "run_at"=>1661867361}], ["worker.1:3:9598347b346b", "1hwq7", {"queue"=>"default", "payload"=> {"retry"=>false, "queue"=>"default", "backtrace"=>true, "lock"=>"until_and_while_executing", "args"=>[], "class"=>"RunSponsoredAdsQueue", "jid"=>"ecafefaf4c6b71ce01daaee4", "created_at"=>1661867957.9982724, "lock_timeout"=>nil, "lock_ttl"=>nil, "lock_prefix"=>"uniquejobs", "lock_args"=>[], "lock_digest"=>"uniquejobs:82157bcb53420755466db7f4590d54e6", "enqueued_at"=>1661867959.0355272}, "run_at"=>1661867959}],

Expected behavior If 1 job is running and has a lock, there shouldn't be another one that is allowed to run

Current behavior Currently 7 of these jobs are running at the same time

Worker class

class RunSponsoredAdsQueue
  include Sidekiq::Worker
  sidekiq_options queue: 'default', retry: false, lock: :until_and_while_executing

  def perform()
    PpcJob.run_ppc_queue()
  end
end

**Additional context**
    sidekiq (6.5.6)
      connection_pool (>= 2.2.5)
      rack (~> 2.0)
      redis (>= 4.5.0, < 5)
    sidekiq-cron (1.7.0)
      fugit (~> 1)
      sidekiq (>= 4.2.1)
    sidekiq-unique-jobs (7.1.27)
      brpoplpush-redis_script (> 0.1.1, <= 2.0.0)
      concurrent-ruby (~> 1.0, >= 1.0.5)
      sidekiq (>= 5.0, < 8.0)
      thor (>= 0.20, < 3.0)

daapower avatar Aug 30 '22 16:08 daapower

It only looks like 7 are running, 6 of them would just be waiting for the first one to complete.

mhenrixon avatar Aug 30 '22 16:08 mhenrixon

I don't fully understand, are you saying 1 is running, the other 6 are waiting for that one to finish before running?

If I execute from the console, unique jobs usually prevents them from running it would just say "start" then immediately "done" so that only 1 job is running.

But for some reason, these jobs are popping up (screenshot below). My worry is that the jobs entail a lot of database read/write and are locking each other in the database preventing them from finishing

image

daapower avatar Aug 30 '22 16:08 daapower

You would probably be better off using until_executed. It looks like you might be hitting a weird edge case.

Is it a problem if a subsequent job is dropped when another is already on the queue waiting to be picked off by the sidekiq worker?

mhenrixon avatar Aug 31 '22 03:08 mhenrixon

I think I used to have it until_executed but it was still queueing the jobs so I changed it to until_and_while_executing.

I'm going to try installing another Redis, perhaps something is messed up with my jobs.

In the meantime I added this code to check if there is already a Sidekiq job with my process name, if there is I don't run the code in the worker

workers = Sidekiq::Workers.new workers.each do |process_id, thread_id, work| if work.dig("payload","class") ==

Thank you for the help

daapower avatar Aug 31 '22 04:08 daapower

Investigating, something is also strange on our part too.

Screenshot 2022-08-31 at 16 48 54
module Mailers
  # Worker activated by an AWS CloudWatch ping every 3 minutes. (as of 13/12/2018)
  class ScheduleInstantWorker < BaseMailerWorker
    sidekiq_options queue: 'sidekiq_instants', lock: :until_executed
sidekiq-unique-jobs (7.1.27)
     brpoplpush-redis_script (> 0.1.1, <= 2.0.0)
     concurrent-ruby (~> 1.0, >= 1.0.5)
     sidekiq (>= 5.0, < 8.0)
     thor (>= 0.20, < 3.0)
     

Audrius avatar Aug 31 '22 15:08 Audrius

Not sure if this is the same bug, but it is a until_and_while_executing lock bug.

I am able to reproduce this behavior like so

TestJob.perform_async
sleep(5)
SidekiqUniqueJobs::Orphans::Reaper.call
TestJob.perform_async

Sample app here

DCrow avatar Sep 08 '22 11:09 DCrow

@DCrow That actually does not reproduce anything. Your sidekiq process isn't running and since you have configured a global TTL of 1 hour the reaper would just consider the lock as active after 5 seconds.

mhenrixon avatar Sep 11 '22 08:09 mhenrixon

@mhenrixon Sorry if I wasn't clear. Prior to executing this script, sidekiq process should also be running. After executing sidekiq process should output to stdout something like this

2022-09-12T10:06:20.714Z pid=36479 tid=109n class=TestJob jid=14b076f9462dafc5897706d4 INFO: start
"Test start"
2022-09-12T10:06:25.722Z pid=36479 tid=1093 class=TestJob jid=b26285ad778e75147a298288 INFO: start
"Test start"

DCrow avatar Sep 12 '22 10:09 DCrow

@daapower do you have a rescue in your WorkerClass or PpcJob somewhere? If so, what is it doing?

Currently investigating similar behavior in our project.

jpriollaud avatar Aug 13 '23 20:08 jpriollaud