delayed_job
delayed_job copied to clipboard
Delayed Job in multiple production instances duplicating job execution.
Hi,
We are using Delayed Job in our rails application. For production, we recently added an additional secondary server to balance the load on the application. As the 2 servers are replicas of the same application, they point to the same database. Here the problem we faced is same job is picked up by delayed jobs of both the servers simultaneously and being processed. As most of the jobs are user notification tasks, the users are being notified multiple times with duplicate notifications
How are you starting DJ?
Hey @albus522 Thanks for the gem ❤️
Just had the same problem. At eola we use delayed job to build and maintain our millions of time slots. When we needed to bump up the workers to work through a backlog (due to an unrelated bug), there was a large amount of duplication.
Most of the jobs take < 1s to run, and we had a max backlog of 1.8 million jobs to get through. We had a brief max of 30 workers at a time.
We run it via QUEUE=default,mailers,low bundle exec rake jobs:work
on Heroku.
@sfcgeorge and I believe that it's because the locked_at
column is a race condition waiting to happen.
We've since scaled back to one worker, and all is well now. Though, when we inevitably do need to scale, is there something we should be doing?
The locking system is a lot more complex than simply the locked_at
column. Usually when people run into multiple workers running jobs at the same time it turns out to be multiple of the same job queued then run by workers. When we have seen the same actual job object picked up by multiple workers it has been due to them having the same worker name. With no other changes, the default name consists of the hostname and worker process pid. I am not actually sure how heroku handles the hostname inside the dynos, it is possible it shares the hostname with the machine it is running on which can run multiple dynos. I do know that the dyno startup can have the process startup with the same pid, but not guaranteed. You might be able to add a bit more certainty by conditionally setting a worker name prefix. Heroku sets an ENV variable DYNO
in the format name.number
. If you add this to your delayed_job initializer, or add one if you don't have one, the dyno name and number will be added to the worker name which is used in the locking mechanism.
module Delayed
class SetHerokuNamePrefix < Plugin
callbacks do |lifecycle|
lifecycle.before(:execute) do |worker|
worker.name_prefix = ENV["DYNO"] if ENV["DYNO"]
end
end
end
end
Hi @albus522 ,
I have a similar situation, we have 2 running application instances on AWS EC2. Same case as explained. We're triggering rake tasks with cronjobs on two application instances and we get duplicated jobs in on databases.
Do you maybe have more detailed explanation on how to tackle that on AWS?
We are using Docker to build containers and jobs are run with the basic command
bin/delayed_job start