cylc-flow
cylc-flow copied to clipboard
Kill background process jobs that timeout but ignore SIGTERM
Problem
Currently a background job on a remote host that intercepts SIGTERM does not get killed when the time limit expires. This is due to the timeout command only sending a SIGTERM by default.
Proposed Solution
The timeout command provides a --kill-after=
option to deal with this, which send the non-interceptable SIGKILL to the process after a number of seconds beyond the usual time limit. In line with Slurm, 30 seconds is probably reasonable.
I think this option just needs to be added to the command in remote.py.
https://github.com/cylc/cylc-flow/blob/9a0ef7b5a80fb9cf8a3e86d71139bf37fbfa387b/cylc/flow/remote.py#L321
This option is only available on recent versions of timeout
(e.g. it works on RHEL7 but not RHEL6) so I think it would have to be configurable
Fair enough. RHEL 6 is currently in final extra extended support, which ends on June 30, 2024. So any check could be removed after then, as people will have had 15 years to upgrade.