cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

sim mode: traceback if execution retries and submission retrys unset

Open wxtim opened this issue 1 year ago • 1 comments

Description

You can get traceback when simulating execution retries if you haven't set submission retries.

How to replicate

mkdir -p ~/cylc-src/bugs/5935
cd ~/cylc-src/bugs/5935

cat > flow.cylc <<__HERE__
[scheduling]
    initial cycle point = 1100
    [[graph]]
        R1 = foo

[runtime]
    [[foo]]
        execution retry delays = 'PT3S'
        [[[simulation]]]
            fail try 1 only = true
            fail cycle points = all
__HERE__

cylc vip --mode simulation --no-detach

Cause

The setting submission retry delay has no default. This means that in TaskJobManager.set_retry_timers the following code will fail unless the user has set submission retry delay:

        submit_delays = (
            rtconfig['submission retry delays']
            or itask.platform['submission retry delays']
        )

A possible second bug

I have found that after fixing this bug there is something wrong with the task_events_mgr, causing an endless loop of new job submissions:

INFO     cylc:task_proxy.py:482 [1348/foo waiting job:01 flows:1] => waiting(queued)
INFO     cylc:task_proxy.py:482 [1348/foo waiting(queued) job:01 flows:1] => waiting
WARNING  cylc:task_events_mgr.py:808 [1348/foo waiting job:02 flows:1] (polled-ignored)submitted at 2024-01-26T09:19:42Z
INFO     cylc:task_proxy.py:482 [1348/foo waiting job:02 flows:1] => waiting(queued)
INFO     cylc:task_proxy.py:482 [1348/foo waiting(queued) job:02 flows:1] => waiting
WARNING  cylc:task_events_mgr.py:808 [1348/foo waiting job:03 flows:1] (polled-ignored)submitted at 2024-01-26T09:19:43Z
INFO     cylc:task_proxy.py:482 [1348/foo waiting job:03 flows:1] => waiting(queued)
INFO     cylc:task_proxy.py:482 [1348/foo waiting(queued) job:03 flows:1] => waiting
....

Other questions

Why does this only affect simulation mode?

wxtim avatar Jan 23 '24 10:01 wxtim

I'm putting this off until the merger of https://github.com/cylc/cylc-flow/pull/5721 changing the way sim mode works almost entirely.

wxtim avatar Mar 04 '24 12:03 wxtim

Sounds like this can go back to 8.4 then...

hjoliver avatar Mar 14 '24 04:03 hjoliver

Closed by #5721?

MetRonnie avatar Mar 14 '24 13:03 MetRonnie

Closed by #5721?

I think probably for master, but not for 8.2.x. Needs double checking on both anyway.

wxtim avatar Mar 14 '24 15:03 wxtim

Oh looks like #5927 was to fix this on 8.2.x, but was closed?

MetRonnie avatar Mar 18 '24 17:03 MetRonnie

So I think that in the end my decision (with Oliver's approval) was as follows:

  • Master after https://github.com/cylc/cylc-flow/pull/5721 and 8.2.x have now diverged significantly.
  • Almost no-one is using simulation mode.

Therefore

  • [x] I double check that my changes fix this bug on master.
  • [x] We add a "wontfix" to 8.2.x.
  • [x] Close the issue.

wxtim avatar Mar 19 '24 09:03 wxtim