cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

Timeout handlers do not execute when the corresponding `abort on X timeout = True`

Open MetRonnie opened this issue 1 year ago • 3 comments

[scheduler]
    [[events]]
        abort on workflow timeout = True
        workflow timeout = PT1S
        abort handlers = echo "ALPHA"
        workflow timeout handlers = echo "TANGO"

The workflow timeout handler is not running when abort on workflow timeout is set

Originally posted by @MetRonnie in https://github.com/cylc/cylc-flow/issues/5959#issuecomment-1957288745

MetRonnie avatar Feb 23 '24 16:02 MetRonnie

The reason for this is that when the workflow aborts, it terminates processes in the subprocpool INCLUDING the event handler.

Changing this behaviour will require careful thought as it could trigger events we don't want. E.G. preparing tasks may go into the submit-failed state erroneously.

oliver-sanders avatar Feb 26 '24 10:02 oliver-sanders

Can we just leverage the cylc stop --now (but not --now --now) code, for the abort shutdown?

stop  -n, --now             Shut down without waiting for active tasks to
                        complete. If this option is specified once, wait for
                        task event handler, job poll/kill to complete. If this
                        option is specified more than once, tell the workflow
                        to terminate immediately.

hjoliver avatar Feb 26 '24 23:02 hjoliver

Abort events take down the scheduler by raising a SchedulerError rather than requesting a shutdown. It's a much more instantaneous stop which also results in a non-zero exit code:

https://github.com/cylc/cylc-flow/blob/9d985f2306c7475073d3960ff3b998d23c1885df/cylc/flow/scheduler.py#L1662-L1663

In the abort case, we want to wait for aborted/timeout handlers, but I guess we might not want to wait for log file retrieval, etc (it could be a really critical shutdown).

oliver-sanders avatar Jul 15 '24 14:07 oliver-sanders