cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

Kill background process jobs that timeout but ignore SIGTERM

Open jfrost-mo opened this issue 1 year ago • 2 comments

Problem

Currently a background job on a remote host that intercepts SIGTERM does not get killed when the time limit expires. This is due to the timeout command only sending a SIGTERM by default.

Proposed Solution

The timeout command provides a --kill-after= option to deal with this, which send the non-interceptable SIGKILL to the process after a number of seconds beyond the usual time limit. In line with Slurm, 30 seconds is probably reasonable.

I think this option just needs to be added to the command in remote.py.

https://github.com/cylc/cylc-flow/blob/9a0ef7b5a80fb9cf8a3e86d71139bf37fbfa387b/cylc/flow/remote.py#L321

jfrost-mo avatar Oct 12 '23 14:10 jfrost-mo

This option is only available on recent versions of timeout (e.g. it works on RHEL7 but not RHEL6) so I think it would have to be configurable

dpmatthews avatar Oct 18 '23 13:10 dpmatthews

Fair enough. RHEL 6 is currently in final extra extended support, which ends on June 30, 2024. So any check could be removed after then, as people will have had 15 years to upgrade.

jfrost-mo avatar Oct 19 '23 08:10 jfrost-mo