tasktiger icon indicating copy to clipboard operation
tasktiger copied to clipboard

Enforce `hard_timeout` if child process hangs

Open thomasst opened this issue 8 years ago • 6 comments

Right now we raise an exception in the child process when a "hard" timeout occurs. This exception can be intercepted in the task code, or a task could be stuck in an interruptible state, causing a task to run longer than specified or even forever.

Should we keep the current hard_timeout behavior, and simply send SIGTERM / SIGKILL to the child a few seconds after the timeout happens? Or should we have a soft_timeout setting that's separate from hard_timeout? I'd ideally like to make soft_timeout something that can be intercepted by the task code (e.g. "stop task if timeout exceeded", or have a code section that can't be interrupted by a soft timeout).

thomasst avatar Jan 09 '17 05:01 thomasst

I think we should support a soft_timeout that the child can catch and handle appropriately.

jkemp101 avatar Jan 09 '17 16:01 jkemp101

Thoughts on how exactly it should work?

thomasst avatar Jan 09 '17 18:01 thomasst

Maybe something like this using a unix socket to send over the stop message. Does the base Task class have a method called before running that could setup a thread to listen on the client socket? Then just do something like self.parent_request_stop = True if it gets a message from the parent.

I like sockets over signals for this type of soft message. Just need to make sure they get cleaned up.

jkemp101 avatar Jan 10 '17 20:01 jkemp101

I would like a soft_timeout setting so some tasks can save state and be resumed. I currently have long-running Celery tasks that require the ability to save state and resume, so this would be a requirement for me before switching to tasktiger. Edit: looks like I can catch the hard timeout exception so soft timeout is no longer a requirement.

An un-interruptible code section would also work, just not sure which is easier for newcomers to use correctly.

I like sockets over signals for this type of soft message. Just need to make sure they get cleaned up.

Wouldn't sockets introduce unnecessary complexity, since we already have the child's PID and sending a signal would be very straight forward?

alanhamlett avatar Jan 19 '17 05:01 alanhamlett

Any thoughts on how to force stop a running (active) task?

I have the task id, I can get all the active task instances but not sure how to force stop an active task.

Tried cancel (looks like it cancels only the scheduled tasks) and delete() (seems to be deleting only the failed tasks) using the task instance which throws TaskNotFound exception.

Any help in this is greatly appreciated!

sauravmahuri2007 avatar Dec 12 '19 14:12 sauravmahuri2007

@sauravmahuri2007 I don't think there's currently a way to do this. Please create a new issue with this feature request. I don't think this request is relevant to this issue, so I'm going to hide both of these comments.

AlecRosenbaum avatar Dec 16 '19 18:12 AlecRosenbaum