delayed_job icon indicating copy to clipboard operation
delayed_job copied to clipboard

Delayed Job in multiple production instances duplicating job execution.

Open seemalakishore opened this issue 6 years ago • 4 comments

Hi,

We are using Delayed Job in our rails application. For production, we recently added an additional secondary server to balance the load on the application. As the 2 servers are replicas of the same application, they point to the same database. Here the problem we faced is same job is picked up by delayed jobs of both the servers simultaneously and being processed. As most of the jobs are user notification tasks, the users are being notified multiple times with duplicate notifications

seemalakishore avatar Jan 24 '19 10:01 seemalakishore

How are you starting DJ?

albus522 avatar Jan 24 '19 14:01 albus522

Hey @albus522 Thanks for the gem ❤️

Just had the same problem. At eola we use delayed job to build and maintain our millions of time slots. When we needed to bump up the workers to work through a backlog (due to an unrelated bug), there was a large amount of duplication.

Most of the jobs take < 1s to run, and we had a max backlog of 1.8 million jobs to get through. We had a brief max of 30 workers at a time.

We run it via QUEUE=default,mailers,low bundle exec rake jobs:work on Heroku.

@sfcgeorge and I believe that it's because the locked_at column is a race condition waiting to happen.

We've since scaled back to one worker, and all is well now. Though, when we inevitably do need to scale, is there something we should be doing?

dansteele avatar Jul 01 '19 22:07 dansteele

The locking system is a lot more complex than simply the locked_at column. Usually when people run into multiple workers running jobs at the same time it turns out to be multiple of the same job queued then run by workers. When we have seen the same actual job object picked up by multiple workers it has been due to them having the same worker name. With no other changes, the default name consists of the hostname and worker process pid. I am not actually sure how heroku handles the hostname inside the dynos, it is possible it shares the hostname with the machine it is running on which can run multiple dynos. I do know that the dyno startup can have the process startup with the same pid, but not guaranteed. You might be able to add a bit more certainty by conditionally setting a worker name prefix. Heroku sets an ENV variable DYNO in the format name.number. If you add this to your delayed_job initializer, or add one if you don't have one, the dyno name and number will be added to the worker name which is used in the locking mechanism.

module Delayed
  class SetHerokuNamePrefix < Plugin
    callbacks do |lifecycle|
      lifecycle.before(:execute) do |worker|
        worker.name_prefix = ENV["DYNO"] if ENV["DYNO"]
      end
    end
  end
end

albus522 avatar Jul 02 '19 14:07 albus522

Hi @albus522 ,

I have a similar situation, we have 2 running application instances on AWS EC2. Same case as explained. We're triggering rake tasks with cronjobs on two application instances and we get duplicated jobs in on databases.

Do you maybe have more detailed explanation on how to tackle that on AWS?

We are using Docker to build containers and jobs are run with the basic command

bin/delayed_job start

tsaghir avatar Apr 11 '24 12:04 tsaghir