hyperqueue icon indicating copy to clipboard operation
hyperqueue copied to clipboard

Kill all executing tasks on worker if worker crashes

Open Kobzol opened this issue 3 years ago • 3 comments

If a worker ends abruptly, its spawned children should be killed. Could be implemented using Linux process groups.

Kobzol avatar May 26 '22 21:05 Kobzol

It is also connected to #66, that a each task should be spawned into a process group and when it is canceled we should clean all processes

spirali avatar May 27 '22 09:05 spirali

@Kobzol How do you envision solving this with process groups? I could try to implement that in my spare time..

There was a nifty hack using DEATHSIG prctl in the child process but that won't help if the processes are forked from an interim thread

rostamn739 avatar Jun 23 '22 02:06 rostamn739

I have been playing with this for a bit and it seems to be quite tricky. We'll probably do some best effort mechanism using process groups and setsid to kill the group/session leader when a sigterm is received on the worker.

I didn't know about DEATHSIG, I'll take a look, thanks.

Kobzol avatar Jun 23 '22 06:06 Kobzol

Fixed by https://github.com/It4innovations/hyperqueue/pull/514.

Kobzol avatar Oct 09 '22 09:10 Kobzol