cylc-flow
cylc-flow copied to clipboard
Feature: Sim mode at runtime
Recheck simulation mode settings at runtime if a broadcast might have changed them. If it has, update these settings.
A waypoint on the road to skip mode.
~Based upon https://github.com/cylc/cylc-flow/pull/5712. Drafted until that is in.~
Check List
- [x] I have read
CONTRIBUTING.mdand added my name as a Code Contributor. - [x] Contains logically grouped changes (else tidy your branch by rebase).
- [x] Does not contain off-topic changes (use other PRs for other changes).
- [x] Applied any dependency changes to both
setup.cfg(andconda-environment.ymlif present). - [x] Tests are included (or explain why tests are not needed).
- [x]
CHANGES.mdentry included if this is a change that can affect users - [x] Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
- [x] If this is a bug fix, PR should be raised against the relevant
?.?.xbranch.
Kicking tests
tests/integration/tui/test_updater.py::test_filters - TypeError: 'NoneType' object is not subscriptable
Looks genuine?
tests/integration/tui/test_updater.py::test_filters - TypeError: 'NoneType' object is not subscriptableLooks genuine?
Yes, it was both genuine and potentially nasty. Hopefully ok now, although it makes me nervous about how well tested this code is: 100% coverage of code ≠ 100% coverage of things users will do
@oliver-sanders
However
foofails again
fail try 1 only != fail submit 1 only: It only plays with execution retry delays, not manual resubmits.
I'm not sure if that's OK, but that's the current behaviour. I'm considering raising a "Question" issue.
That's OK, but the docs could do with that detail
That's OK, but the docs could do with that detail
Agreed - see the commit immediately above this comment. 😄 (https://github.com/cylc/cylc-flow/pull/5721/commits/e9a087e77eca37b46573cbb6d4cdf7f25fd8ced0)
Are the mode_settings are being created when the task is created (which would be immediately after 1/a submits in this case) rather than when the task is submitted (which is after the broadcast)?
No.
It turns out that this problem is caused by sim_task_failed expecting a type of Union[PointBase, Literal["all"]] for sim_conf['fail cycle points'], but not running the validation method parse_fail_cycle_points after the broadcast.
I wonder if the check parse_fail_cycle_points should be a parsec validator, so that Cylc Broadcast throws out garbage values before broadcasting them. As it is you can broadcast rubbish which will then stop the workflow:
# With Oliver's example workflow
> cylc broadcast left.lemon -n a -p 2 -s '[simulation]fail cycle points = aiusrfhpawiue4h'
then in the log
2024-02-27T09:38:42Z INFO - Broadcast set:
+ [2/a] [simulation]fail cycle points=aiusrfhpawiue4h
...
# workflow keeps going for some time
...
2024-02-27T09:38:59Z ERROR - Workflow shutting down - PointParsingError: Incompatible value for <class 'cylc.flow.cycling.integer.IntegerPoint'>: aiusrfhpawiue4h:
invalid literal for int() with base 10: 'aiusrfhpawiue4h'
I tried setting
fail try 1 only=Trueafterfoohad already retried a few times and got this
Think that I've fixed this.
Broadcasting
[simulation]default run length=PT1Scauses the task to succeed 😕
Got it - side effect of earlier change.
- Broadcasting
execution retry delaysdoes not cause a change when running in simulation mode - Also, in the Edit Runtime form,
submission retry delaysshows as1for some reason even though it is not set inflow.cylc
- Broadcasting
execution retry delaysdoes not cause a change when running in simulation mode
I cannot replicate this on the command line. Given the context of your comment I tried it from the gui and still couldn't replicate.
In fact some of the tests I've written should fail if that were the case.
Can you come up with a concrete example?
Replicated the submission retry delays thing. Pretty clueless about that though.
- Broadcasting
execution retry delaysdoes not cause a change when running in simulation modeI cannot replicate this on the command line. Given the context of your comment I tried it from the gui and still couldn't replicate.
In fact some of the tests I've written should fail if that were the case.
Can you come up with a concrete example?
[scheduler]
allow implicit tasks = True
[scheduling]
initial cycle point = 2024
runahead limit = P0
[[graph]]
P1M = foo => bar
[runtime]
[[foo]]
execution retry delays = 100*PT10S
script = false
[[[simulation]]]
fail cycle points = all
default run length = PT10S
fail try 1 only = False
cylc broadcast sim-mode -n foo -s 'execution retry delays=20*PT1S'
But it still keeps saying
WARNING - [20240101T0000Z/foo waiting job:06 flows:1] retrying in PT10S
(The Edit Runtime form shows it correctly but it is not having an effect)
@MetRonnie - This is a good spot. I think that the broadcasts probably need to be applied in the TaskJobManager._simulation_submit_task_jobs method to ensure that we are broadcasting all task settings, not just those in [simulation]. I think you might also have found problems clearing broadcasts with things as they were, since I was changing the itask.tdef.rtconfig not a copy of it.
Conflict resolution looks good
Thank you @hjoliver
Genuine test failure presumably: tests/f/modes/05-sim-trigger.t
I'm looking at the broken test -it's just a log-grep issue after cylc set changes to log messages... I've pushed a fix to the PR branch.
