parentless sequential xtrigger spawning
supersedes #5732
Spawning parentless xtrigger tasks out to the run ahead limit is unnecessary in the majority of cases, and creates a lot of UI clutter.
This PR makes sequential spawning on xtrigger satisfaction for parentless tasks optional, with non-sequential the default (no change). And the default for clock triggers.
Justification:
- Naturally clock trigger tasks are sequential in time, so it makes sense to only spawn the next parentless clock-triggered task when the current clock trigger is satisfied.
- Most workflows are sequential, and so workflows/downstream-workflows usually don't need to check xtriggers out to the runahead limit (i.e.
workflow_state). - It makes little difference (split seconds) in "catch-up" mode, where all xtriggers will be satisfied on first check.
- It puts less load on the Scheduler and UI, with
~1/runahead-limittimes less xtriggers to check and tasks to show. - If the user wants non-sequential xtrigger checking, that xtrigger can be specified as such.
At NIWA we have a modular approach to running our workflows, both in operations (around ~50 workflows, most interconnected) and in research (commonly downstream of operational workflows). So wall_clock and workflow_state are by far the most common xtriggers. We do have some requirement for non-sequential workflow_state xtriggers (i.e. satellite checking, where data may or may not come in for a cycle), however, nearly all workflows and downstream workflows are sequential.. And many of these require very large runahead limits.
Take the following simple example;
[scheduling]
initial cycle point = 20230920T0530Z
runahead limit = P5
[[xtriggers]]
clock_1 = wall_clock()
up_1 = workflow_state(workflow=clockspawn, task=b, point=%(point)s):PT7S
[[graph]]
PT1M = """
@clock_1 => a
a => b
@up_1 => c
"""
[runtime]
[[root]]
script = sleep 2
[[a,b,c]]
Currently that produces:
With this default behavior changed;
[scheduling]
initial cycle point = 20230920T0530Z
runahead limit = P5
spawn from xtriggers sequentially = True
[[xtriggers]]
clock_1 = wall_clock()
up_1 = workflow_state(workflow=clockspawn, task=b, point=%(point)s):PT7S
.
.
.
it is reduced to:
Real workflows commonly have way more tasks (branches length and/or parallel) and way larger runahead limits, so the efficiency and clutter-less gains are massive.
And if for some reason users want some xtriggers to be checked/run out into the future/RH-limit in a non-sequential manner, then they can override the default with an argument (i.e. sequential=True/False):
spawn from xtriggers sequentially = True
[[xtriggers]]
clock_1 = wall_clock()
up_1 = workflow_state(workflow=clockspawn, task=b, point=%(point)s, sequential=False):PT7S
.
.
.
(if there's multiple xtriggers on a single task it upholds the sequential setting if any)
This sequential argument name is reserved, and not passed onto the xtrigger module/function.
It can be defined in the xtrigger module/function (as in workflow declaration), but must have a Boolean default:
(i.e. from modifying
cylc.flow.xtriggers.workflow_state to include settings arg in function definition)
cylc.flow.xtriggers.wall_clock.wall_clock has sequential=True set
On xtrigger satisfaction the corresponding parentless sequential xtriggered task (PSXT) will spawn out to the RH limit, or next occurrence of a PSXT, or non-parentless task occurrence.
Note: Tui sometimes shows no tasks before a corresponding xtrigger is satisfied. This problem is not reflected in the data-store or WUI.
Check List
- [x] I have read
CONTRIBUTING.mdand added my name as a Code Contributor. - [x] Contains logically grouped changes (else tidy your branch by rebase).
- [x] Does not contain off-topic changes (use other PRs for other changes).
- [x] Applied any dependency changes to both
setup.cfg(andconda-environment.ymlif present). - [x] Tests are included (or explain why tests are not needed).
- [ ]
CHANGES.mdentry included if this is a change that can affect users - [ ] Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
Note, the Cylc 7 spawning behaviour can be achieved by adding submit dependencies between the parentless tasks e.g:
@up => a
a[-PT1M]:submitted => a
This forces the sequentiality of a without prohibiting multiple instances of a from running in parallel.
Note, the Cylc 7 spawning behaviour can be achieved by adding submit dependencies between the parentless tasks e.g:
@up => a a[-PT1M]:submitted => aThis forces the sequentiality of
awithout prohibiting multiple instances ofafrom running in parallel.
IMO that introduces clutter and false dependencies into the workflow definition.. I would have to do that roughly thousands of times across all 50 workflows.. (a number of these workflows query all other workflows' tasks via workflow_state)
(by false dependency I mean; you desire xtriggers not to run all at once to RH and/or reduction of UI clutter (changing the spawning behavior), you don't want a to depend on previous a in any way)
Also some tasks might be in queues or tight job scheduler queues (and require submission retries).. (it would also be in the visible in the graph window, and I kinda like the SoD Cylc8 minimalist behavior)
I have a change inbound, like what we messaged about in Element, where this will be solved without effecting the current behavior.. Using one line of workflow definition ..
Restart/reload and remove are now handled..
Remove will cause the next occurrence to spawn, the argument being:
- the non-sequential case would be spawned, so be consistent.
- otherwise the workflow would shutdown if it's the only task (and we have a stop button for that).
I assume the docs are auto-generated from the config definition.. but I suppose I should build it with this branch to find out.
Think I've made the suggested changes .. however not sure why fast tests are failing
TUI integration test failures are unrelated to your changes - see #5849
However there are a few functional test failures here
However there are a few functional test failures here
Looks like it was the ValueError vs KeyError thing. Functional tests passing now 👍
However there are a few functional test failures here
Looks like it was the
ValueErrorvsKeyErrorthing. Functional tests passing now 👍
Yeah, I had in my head that it was a list for some reason 🤦♂️
Oh I've just spotted earlier up the conversation @oliver-sanders was in the process of writing integration tests - any progress?
I got as far as the code I commented here: https://github.com/cylc/cylc-flow/pull/5738#pullrequestreview-1747695603
But it looks like there were some issues documented in TODOs.
There are a lot of potential interactions in the task pool [e.g. 1]. So for testing, it's less about checking that the sequential spawning is working and more to do with testing the interaction of sequential spawning with other functionalities to make sure we don't have any runtime bugs lurking.
[1] With delayed spawning there's a risk that the task pool could become prematurely empty (e.g. as a result of client commands) leading the workflow to shut down as complete.
I'll add more tests. (note: any test with clocktriggers will now test this behavior implicitly (in w/e ways it does))
I've found some holes in the spawning of parentless tasks with unsatisfied sequential xtriggers (via Oliver's integration test, and replicated manually).. should be easy fix
Added the integration tests (thanks Oliver), and have gotten the existing scenarios/tests working after adding more spawning conditions and requirements. Will add more scenarios next.
hmm... all the TUI tests are failing...
hmm... all the TUI tests are failing...
Kicking off a test run on master to make sure this isn't the result of an external factor: https://github.com/cylc/cylc-flow/actions/runs/7887199508
Fixed on master now. Kicking tests
Even though both clock triggers are sequential by default, I get all of the tasks spawning out to the runahead limit ~1s after each other
I guess the expected behavior would be to only spawn when all sequential clocktriggers are satisfied? will add that behavior in
Even though both clock triggers are sequential by default, I get all of the tasks spawning out to the runahead limit ~1s after each other
I guess the expected behavior would be to only spawn when all sequential clocktriggers are satisfied? will add that behavior in
Yeah, not sure about this .. Maybe xtriggers shouldn't hold each other up, and xtriggers could be doing who knows what..
(and BTW - ~1s spawn out to the RH limit is because clock_1 is satisfied straight away, so the next is spawned, as expected)
The point of this PR is to introduce a reasonable (and optional) restriction on the unbounded spawning and xtrigger-checking out the the RH limit.
If an xtrigger is satisfied, then the next occurrence (which may or may not be the next task occurrence) can be checked.. For example (this could be any xtrigger):
[scheduler]
allow implicit tasks = True
[scheduling]
initial cycle point = 2018
[[xtriggers]]
clock_1 = wall_clock()
clock_2 = wall_clock(P10Y)
[[graph]]
P2D = @clock_1 & @clock_2 => foo
^+P1D/P2D = @clock_1 => foo
At present it works:
If we restrict spawning until both are ready, then the next foos that can run won't.
Of course, having only clock_2 on the first:
[scheduler]
allow implicit tasks = True
[scheduling]
initial cycle point = 2018
[[xtriggers]]
clock_1 = wall_clock()
clock_2 = wall_clock(P10Y)
[[graph]]
P2D = @clock_2 => foo
^+P1D/P2D = @clock_1 => foo
Will stop foo from spawning to a cycle that it would be able to run.. But this is in the hands of the workflow designer.
However, I can understand that there is an inconsistency here:
The presence of an unsatisfied sequential xtrigger will prevent the task spawning, and hence, future checking of any non-sequential xtriggers (regardless of their satisfaction). Yet the presence of an unsatisfied sequential xtrigger does not prevent spawning of a task with satisfied sequential xtriggers
So in light of that, I think I'm full circle to, maybe the spawning should be held up in the presence of any unsatisfied sequential xtrigger.. It might be a bit strong, but at least this behavior will be by design.
Ok fixed it now:
[scheduler]
allow implicit tasks = True
[scheduling]
initial cycle point = 2018
[[xtriggers]]
clock_1 = wall_clock()
clock_2 = wall_clock(P10Y)
[[graph]]
P1D = @clock_1 & @clock_2 => foo
(TUI tests failing .. Seems unrelated, passed locally)
Tui has been broken by an urwid release, don't upgrade beyond 1.6.1 and you'll be fine. There may be a fix soon.
I've sorted out the conflict and added a few tests in https://github.com/dwsutherland/cylc-flow/pull/19
Note, will need a changelog entry (autogenerated these days, see the CONTRIBUTING page) and possibly a small entry in the "External Triggers" section of cylc-doc?
Can build off https://github.com/cylc/cylc-doc/pull/686
I've sorted out the conflict and added a few tests in dwsutherland#19
Saw there were a lot of conflicts with that PR, I assume you had rebased on master, so I rebased this branch on master. Now all conflicts in your branch are only in one file (the integration test).
I hadn't rebased, I had merged master into this branch for dwsutherland#19 (before you rebased this branch on top of master). There was a merge mistake when you pushed to dwsutherland#19 after this rebase so I've now force-pushed a fix onto this branch. If that makes sense lol
Ah I see you've force pushed again - the merge mistake is here: https://github.com/cylc/cylc-flow/compare/598e0afe6c2e89cbcc6353f48e2363fe2dde34f2..8e7e3f17fc8832aa6ebb9a9dec800ec85d624762#diff-616dbf97f9b89762d7da3666f7ab3abf0cc3f2e6443b5367af9d25439a4321d5R334-R354
The fix is this
diff --git a/cylc/flow/xtrigger_mgr.py b/cylc/flow/xtrigger_mgr.py
index ea4b546da..762d8aa5b 100644
--- a/cylc/flow/xtrigger_mgr.py
+++ b/cylc/flow/xtrigger_mgr.py
@@ -17,5 +17,5 @@
from contextlib import suppress
from enum import Enum
-from inspect import getfullargspec, signature
+from inspect import signature
import json
import re
@@ -332,25 +332,4 @@ class XtriggerManager:
label, f"'{fname}' not callable in xtrigger module '{fname}'",
)
- x_argspec = getfullargspec(func)
- if 'sequential' in x_argspec.args:
- if (
- x_argspec.defaults is None
- or not isinstance(
- x_argspec.defaults[x_argspec.args.index('sequential')],
- bool
- )
- ):
- raise XtriggerConfigError(
- label,
- fname,
- (
- f"xtrigger module '{fname}' contains reserved argument"
- " name 'sequential' that has no boolean default"
- ),
- )
- elif 'sequential' not in fctx.func_kwargs:
- fctx.func_kwargs['sequential'] = x_argspec.defaults[
- x_argspec.args.index('sequential')
- ]
sig = signature(func)
diff --git a/tests/integration/test_config.py b/tests/integration/test_config.py
index 72ec69eda..0a186aa41 100644
--- a/tests/integration/test_config.py
+++ b/tests/integration/test_config.py
@@ -358,5 +358,5 @@ def test_xtrig_validation_wall_clock(
})
with pytest.raises(WorkflowConfigError, match=(
- r'\[@myxt\] wall_clock\(offset=PT7MH, sequential=True\) validation failed: '
+ r'\[@myxt\] wall_clock\(offset=PT7MH\) validation failed: '
r'Invalid offset: PT7MH'
)):
Ah I see you've force pushed again - the merge error is here: https://github.com/cylc/cylc-flow/compare/598e0afe6c2e89cbcc6353f48e2363fe2dde34f2..8e7e3f17fc8832aa6ebb9a9dec800ec85d624762#diff-616dbf97f9b89762d7da3666f7ab3abf0cc3f2e6443b5367af9d25439a4321d5R334-R354
Haha.. Apologies, yes, I rebased and did some tests.. will squash as fix to your commit
@dwsutherland this is now my main focus for review - are you working on conflict resolution, post cylc set merge? Ping me if you need any help understanding the changes from that PR.
@dwsutherland this is now my main focus for review - are you working on conflict resolution, post
cylc setmerge? Ping me if you need any help understanding the changes from that PR.
Yes, working on this now.. thanks will do
Testing ... looks good so far, nice change.