cylc-flow
cylc-flow copied to clipboard
sim mode: traceback if execution retries and submission retrys unset
Description
You can get traceback when simulating execution retries if you haven't set submission retries.
How to replicate
mkdir -p ~/cylc-src/bugs/5935
cd ~/cylc-src/bugs/5935
cat > flow.cylc <<__HERE__
[scheduling]
initial cycle point = 1100
[[graph]]
R1 = foo
[runtime]
[[foo]]
execution retry delays = 'PT3S'
[[[simulation]]]
fail try 1 only = true
fail cycle points = all
__HERE__
cylc vip --mode simulation --no-detach
Cause
The setting submission retry delay has no default. This means that in TaskJobManager.set_retry_timers the following code will fail unless the user has set submission retry delay:
submit_delays = (
rtconfig['submission retry delays']
or itask.platform['submission retry delays']
)
A possible second bug
I have found that after fixing this bug there is something wrong with the task_events_mgr, causing an endless loop of new job submissions:
INFO cylc:task_proxy.py:482 [1348/foo waiting job:01 flows:1] => waiting(queued)
INFO cylc:task_proxy.py:482 [1348/foo waiting(queued) job:01 flows:1] => waiting
WARNING cylc:task_events_mgr.py:808 [1348/foo waiting job:02 flows:1] (polled-ignored)submitted at 2024-01-26T09:19:42Z
INFO cylc:task_proxy.py:482 [1348/foo waiting job:02 flows:1] => waiting(queued)
INFO cylc:task_proxy.py:482 [1348/foo waiting(queued) job:02 flows:1] => waiting
WARNING cylc:task_events_mgr.py:808 [1348/foo waiting job:03 flows:1] (polled-ignored)submitted at 2024-01-26T09:19:43Z
INFO cylc:task_proxy.py:482 [1348/foo waiting job:03 flows:1] => waiting(queued)
INFO cylc:task_proxy.py:482 [1348/foo waiting(queued) job:03 flows:1] => waiting
....
Other questions
Why does this only affect simulation mode?
I'm putting this off until the merger of https://github.com/cylc/cylc-flow/pull/5721 changing the way sim mode works almost entirely.
Sounds like this can go back to 8.4 then...
Closed by #5721?
Closed by #5721?
I think probably for master, but not for 8.2.x. Needs double checking on both anyway.
Oh looks like #5927 was to fix this on 8.2.x, but was closed?
So I think that in the end my decision (with Oliver's approval) was as follows:
- Master after https://github.com/cylc/cylc-flow/pull/5721 and 8.2.x have now diverged significantly.
- Almost no-one is using simulation mode.
Therefore
- [x] I double check that my changes fix this bug on master.
- [x] We add a "wontfix" to 8.2.x.
- [x] Close the issue.