delayed_job icon indicating copy to clipboard operation
delayed_job copied to clipboard

Delayed::Job Always Force Kills on Restart

Open tomrossi7 opened this issue 11 years ago • 14 comments

Restarting delayed jobs always has to forcefully kill the existing process:

RAILS_ENV=production bin/delayed_job restart --pid-dir=/srv/app/shared/pids/
delayed_job: trying to stop process with pid 2929...
delayed_job: process with pid 2929 won't stop, we forcefully kill it...
delayed_job: process with pid 2929 successfully stopped.

I am running rails (4.0.0) and delayed_job_active_record (4.0.0).

I'm not sure if this is a bug or something I am doing. Any ideas on what the problem can be?

Thanks, Tom

tomrossi7 avatar Oct 21 '13 00:10 tomrossi7

I have exactly the same problem.

richardriman avatar Nov 26 '13 18:11 richardriman

Same here, in my development environment. delayed_jobs table is empty. Rails 4, Ruby 2.0.

SebastianZaha avatar Feb 12 '14 13:02 SebastianZaha

Anyone have any solutions?

tomrossi7 avatar Jul 21 '14 16:07 tomrossi7

+1

joshuasiler avatar Apr 23 '15 21:04 joshuasiler

That is the daemons gem being overly aggressive about killing the process. DJ will wait for the current job to finish before it exits. The bad news is that means daemons is force killing an active job at some random point in its execution.

If the jobs table is empty. That means you are in the best case scenario and the worker was in the middle of the sleep delay between checking for new jobs when the daemons gem force kills it. However, even then it means the process isn't able to run any at exit cleanup, like properly closing open database connections.

We will need to see if we can tell the daemons gem to lay off and let us finish.

albus522 avatar Apr 24 '15 14:04 albus522

@albus522 That sounds great, but hasn't been my experience. Even with absolutely no jobs in the table, it still has to kill the process.

tomrossi7 avatar Apr 24 '15 15:04 tomrossi7

@tomrossi7 Did you read the second paragraph of my response?

albus522 avatar Apr 24 '15 15:04 albus522

@albus522 Sorry, I'm not trying to be a jerk, I don't understand it. I'm not sure why the daemons gem needs to lay off? Are you saying it needs to give even more time for the the process to wake up so it can stop it?

tomrossi7 avatar Apr 24 '15 15:04 tomrossi7

Yes. The best I can tell newer daemons builds give the process 20 seconds to exit or it hard terminates the process. If you have no jobs running and have the default DJ configuration, that is fine as the sleep_delay is 5 seconds. So DJ will typically exit just fine within that 20 second window.

However if the user modifies the sleep_delay or a job is running, that window can be much longer than 20 seconds. The default max run time is 4 hours, and both the max run time and sleep_delay can be set to anything the user wants.

So, in the case of DJ, the decision to hard terminate the worker should never be made by the daemons gem as it doesn't know what it should do. The daemons gem has been a continual source of headaches for us, but unfortunately we haven't found anything better yet.

albus522 avatar Apr 24 '15 15:04 albus522

Thank you for this explanation. Our sleep delay is 60 seconds, so it explains the problem completely.

Would love to see configureable wait time on the stop script. In the meantime we'll just reduce the sleep delay.

joshuasiler avatar Apr 24 '15 15:04 joshuasiler

Ah! I lowered my sleep delay and now it can restart without killing the process!

tomrossi7 avatar Apr 24 '15 19:04 tomrossi7

https://github.com/collectiveidea/delayed_job/pull/916

apurvis avatar May 17 '16 08:05 apurvis

daemons gem seems to set it to 20 seconds by default: https://github.com/thuehlinger/daemons/blob/0ea14143c375f0bec117eb2a7ae2f78623b83867/lib/daemons/application.rb#L37 Though seems it can be specified with force_kill_waittime parameter from: https://github.com/collectiveidea/delayed_job/blob/73bd1b50e719b336b70fcbb8dc4a37ec9b2f6f35/lib/delayed/command.rb#L123

take-cheeze avatar Jul 12 '18 04:07 take-cheeze

I think this Issue can be closed since it's not an issue with the delayed_job library, but rather whatever process manager you have that is waiting for the delayed_job to exit cleanly after a SIGINT before forcefully killing it with a SIGTERM or SIGKILL.

joshuapinter avatar Jan 05 '21 15:01 joshuapinter