sidekiq-unique-jobs icon indicating copy to clipboard operation
sidekiq-unique-jobs copied to clipboard

Got often got Limit exceeded message in changelog

Open khrisnagunanasurya opened this issue 2 years ago • 30 comments

Used versions Sidekiq: 6.4.1 Sidekiq Unique Jobs: 7.1.27 Rails: 7.0.2.2 Redis: 4.6.0

Describe the bug Often get limit exceeded in the changelog even with a job triggered with an interval of 200ms, and the locks don't lock in the order the jobs were performed asyncally

Performed jobs in order

1. product_1, sso_company_id_1, leads
2. product_1, sso_company_id_1, active
3. product_1, sso_company_id_1, active:upgrade
4. product_1, sso_company_id_2, leads
5. product_1, sso_company_id_2, active 
6. product_1, sso_company_id_2, active:upgrade

Expected behavior

  • Jobs performed in correct ordered manners for each product and company id
  • Don't create new jobs to retry when there's an error
  • All jobs are executed
# It should execute the jobs in this order

1. product_1, sso_company_id_1, leads
2. product_1, sso_company_id_1, active
3. product_1, sso_company_id_1, active:upgrade
4. product_1, sso_company_id_2, leads
5. product_1, sso_company_id_2, active 
6. product_1, sso_company_id_2, active:upgrade

Current behavior

  • Jobs performed in incorrect ordered manners for each product and company id
  • Some jobs were executed in order and some not
  • Some jobs are not executed at all
  • somehow even with sidekiq job retry is set to 0, when there is an error, the job will be re-triggered again, which shouldn't
image
# what actually being processed

1. product_1, sso_company_id_1, active:upgrade
2. product_1, sso_company_id_1, leads
3. product_1, sso_company_id_1, active
4. product_1, sso_company_id_2, leads
5. product_1, sso_company_id_2, active 
6. product_1, sso_company_id_2, active:upgrade

Worker class

class CreateHistoryJob
  prepend DatadogTracerService::JobPerform

  include Sidekiq::Worker

  sidekiq_options retry: 0,
                  backtrace: 20,
                  lock: :until_and_while_executing, # already tried with until_executing, while_executing & until_executed
                  lock_args_method: :lock_args,
                  lock_info: true,
                  lock_prefix: 'create_history',
                  lock_timeout: 10,
                  on_conflict: {
                    client: :log,
                    server: :raise
                  }

  def self.lock_args(args)
    [args[1], args[2]]  # product & sso_company id
  end

  def perform(topic_name,
              product,
              sso_company_id,
              status,
              timestamp_in_millisecond,
              triggered_by = nil,
              move_pi = false)
 ......
  end
end

Sidekiq initializers

require 'sidekiq'
require "sidekiq-unique-jobs"

class AppendCustomDataMiddleware
  def call(worker, job, queue)
    worker.retry_count = job['retry_count'] if worker.respond_to?(:retry_count=)
    yield
  end
end

Sidekiq.configure_server do |config|
  config.redis = { url: ENV.fetch('REDIS_URL') { 'redis://localhost:6379/0' } }
  config.log_formatter = Sidekiq::Logger::Formatters::JSON.new

  config.client_middleware do |chain|
    chain.add SidekiqUniqueJobs::Middleware::Client
  end

  config.server_middleware do |chain|
    chain.add AppendCustomDataMiddleware
    chain.add SidekiqUniqueJobs::Middleware::Server
  end

  SidekiqUniqueJobs::Server.configure(config)
end

Sidekiq.configure_client do |config|
  config.redis = { url: ENV.fetch('REDIS_URL') { 'redis://localhost:6379/0' } }

  config.client_middleware do |chain|
    chain.add SidekiqUniqueJobs::Middleware::Client
  end
end

Sidekiq.strict_args!(false)

What I do for test

1000.times do |o|
  start_time = Time.now
  company_id = SecureRandom.uuid
  puts "#{o} - #{company_id}"

  CreateHistoryJob.perform_async("kp_development_onboarding", "klikpajak", company_id, "leads", (Time.now.to_f * 1000), "klikpajak")

  sleep(0.2)

  CreateHistoryJob.perform_async("kp_development_onboarding", "klikpajak", company_id, "active", (Time.now.to_f * 1000), "klikpajak")

  sleep(0.2)

  CreateHistoryJob.perform_async("kp_development_onboarding", "klikpajak", company_id, "active:upgrade", (Time.now.to_f * 1000), "klikpajak")

  sleep(0.2)

  puts "finished in #{Time.now - start_time}"
end

khrisnagunanasurya avatar Nov 29 '22 06:11 khrisnagunanasurya

First, there are no guarantees that the jobs will be executed in order ever. If that's what you need, you must create the next job at the end of executing the previous one.

That is not something that I can ever guarantee. I'll read the rest of it later; that was just the first thing that struck me.

mhenrixon avatar Dec 03 '22 10:12 mhenrixon

@mhenrixon What is the reason for Limit exceeded in uniquejobs:changelog?

We are trying to update Sidekiq 5.1.3 / sidekiq-unique 5.0.10 to a recent version, but in production our queues were not behaving as expected. In the Redis logs, we saw many "Limit exceeded" messages being added to uniquejobs:changelog, but we are not sure what that means.

tsauerwein avatar Sep 07 '23 14:09 tsauerwein

Having this issue quite often on "long" running jobs (couple of minutes). Also curious as to what Limit exceeded means exactly ?

nathan-appere avatar Feb 09 '24 13:02 nathan-appere

I infer that Limit exceeded means that a job was not added to the queue because it still has a uniquejobs lock, but I might be wrong about that. I'm getting that every time I try to add a job via sidekiq-cron, it ends up not getting enqueued because of Limit exceeded, even though there does not appear to be a corresponding lock (and no job is already running).

ragesoss avatar Feb 26 '24 21:02 ragesoss

@nathan-appere @ragesoss I should likely replace that message with a better one.

You can allow any number of simultaneous locks. This message is likely confusing for people who don't use lock_limit: on their jobs.

I'll see about making this less confusing.

mhenrixon avatar Feb 27 '24 13:02 mhenrixon

@mhenrixon Hi there, i just started using this gem, and im also confused by this message. Im using :replace strategy and i dont understand why i get limited_exceeded and it seems to be retried given that theres a previous job_id. Also lock_limit is not documented on the README, i believe. Thanks

chuan29812 avatar Apr 10 '24 17:04 chuan29812