Command task matching requirements
Supersedes: #5763, #5695
See also: #5752, #5416, #5677
TODO:
- [ ] hold
- [ ] set
- [ ] trigger
- [ ] release
- [ ] kill
- [ ] poll
- [ ] remove
- show
- [ ] show command
- [ ] Update the troubleshooting page to remove the note about
cylc shownot working outside the n=0 window.
target task population
Different commands need to select tasks from different populations:
| command | population |
|---|---|
hold, set, trigger |
pool, future, past |
release |
held-tasks list only #5752 |
kill, poll |
pool only |
remove |
pool, past |
show |
pool only (but maybe future, past too #5677) |
task name wildcards
We need to support matching task names by glob (or regex) and family name in all cases (not just in the pool.
For future and past tasks this means:
- find all matching task def names (or member tasks of matching family names)
- check which are valid for the given cycle point(s)
cycle point wildcards
We need to support matching cycle points by glob in the pool, e.g. to target all incomplete tasks.
We should not support cycle point wildcards outside of the pool (well, * is certainly bad; [5-8] (e.g.) may be OK in principle but still dangerous and not that much of a plus to users anyway).
task qualifiers
Use of some task qualifiers (i.e. :output) necessarily restricts matching to the pool. E.g. cylc set --out=succeeded "*:failed" should set all n=0 incomplete failed tasks to succeeded.
We can probably decide that use of any qualifier means pool only (is there any need to distinguish between e.g. completed-succeeded and completed-failed past tasks?)
Terminology:
- pool: task proxies in the
n=0active window - future, past: abstract tasks anywhere else in the graph, i.e.
n>0andn<0
Update to the above, regarding cycle point wildcards:
- it would not be OK to allow
cylc trigger */*in an infinite workflow - that would cause a meltdown - but it would be OK to allow
cylc hold */*- that just means "hold any task that gets spawned" which is perfectly safe
So that's another difference in task-matching requirements for different commands.
task name wildcards
We need to support matching task names by glob (or regex) and family name in all cases (not just in the pool. For future and past tasks this means:
- find all matching task def names (or member tasks of matching family names)
- check which are valid for the given cycle point(s)
Note we already have the machinery to determine if given tasks (by name) are valid at a particular cycle point.
There is still potentially an issue as to whether the task would actually end up running there automatically due to optional branching (or even manual interventions) upstream of it.
However, I don't think that really matters, because:
- if you manually set or trigger such a task, you are saying do it now regardless of that
- if you (say) hold such a task, that essentially says "hold it IF it gets spawned in the future". If it doesn't get spawned we're just left with a small housekeeping problem in the hold list.
Finally, for task names, this "problem" (if it is one) applies equally to individual future tasks - which we already allow. For a family name or glob, we just end up with more tasks at the target point.
Agreed this is needed ..
And if it is to work using a broadcast style/approach, without reading context (which may address this), we should think about using a pop-up window in the UI to show all current broadcasts/actions.
This means we/users can just click an x on whatever items/future-actions we want to stop/undo (i.e. those that don't or haven't decay(ed) naturally with task completion).
Recommend tackling this in combination with the use cases from https://github.com/cylc/cylc-flow/issues/5416
The trigger part of this is now a very high priority needed to fully exploit group-trigger, tagged against the 8.6.0 milestone for that reason.
Task-like IDs can be used with the operations this issue is concerned with, e.g. trigger, hold and kill. Note, task-like as opposed to workflow-like.
Syntax
Syntax:
cycle[:selector][/namespace[:selector][/job[:selector]]]
Examples:
2000/foo/01:failed
2000/foo:failed
2000:failed
2000/foo/01
2000/foo
2000
Glob support:
* # Matches everything.
? # Matches any single character.
[seq] # Matches any character in "seq".
[!seq] # Matches any character not in "seq".
Examples:
2000/* # 2000/a, 2000/b, 2000/c, ...
2000/task_prefix_* # 2000/task_prefix_1, 2000/task_prefix_2, 2000/task_prefix_3, ...
2000/task_??_start # 2000/task_01_start, 2000/task_02_start, 2000/task_03_start, ...
2000/task_0[01]_start # 2000/task_01_start, 2000/task_02_start
2000/task_[!1]* # 2000/task_02_start, 2000/task_03_start, 2000/task_04_start, ...
Globs
At present, these IDs are matched against the task pool (i.e. the workflow's n=0 window).
Families are supported, they effectively get expanded into a list of the tasks which is then matched against the task pool.
As such we currently consider families as equivalent to globs (basically an implicit glob).
# select all members of FAM (in all cycles) that are also n=0
*/FAM
# select all tasks in the cycle 2000
2000 # this is shorthand for 2000/root
2000/root
2000/* # also equivalent
Existing Glob Use Cases (n=0 task matching)
- Trigger all [submit-]failed tasks:
*:[submit-]failed
*/*:[submit-]failed
- Orphan active jobs (e.g. platform dead, need to tell Cylc they've failed so we can resubmit to another platform):
# if all tasks in workflow are on the same platform
*:submitted
*:failed
# otherwise can select by family, e.g. HPC
FAM:submitted
FAM:running
- Remove a subgraph before it runs:
# note: only the n=0 tasks need to be removed
2000/
2000/root
2000/*
- Trigger n=0 tasks / satisfy their prerequisites:
2020/FAM # note this may be equivalent to the family's "group start tasks" in many cases
task_prefix_*
- Other?
Desired Use Cases (Configuration Matching)
For some use cases, we would like to be able to perform pattern matching beyond the n=0 window. I.E, FAM would expand to all tasks within the family, even if they are not currently present in the n=0 window.
- Group trigger
2000/ # run a whole cycle
2000/FAM # run a whole family
- Hold a group
2000/
2000/FAM
- Other?
Limitations Of Configuration Matching
-
Select by state:
It only makes sense to filter by state against n=0 tasks. E.G,
cylc trigger '*:failed- we don't want to trigger all failed tasks going back to the start of the workflow, only the ones that are presently in the n=0 window, i.e. final-incomplete tasks.*:failed 2000/*:failed 2000/FAM:failed -
Glob by cycle:
The workflow might be infinite, we cannot list every possible cycle point the task could exist in.
*/FAM
Task Selection In The GUI / Tui
If you run a command against a cycle/family (i.e. click in GUI, press enter in Tui), the command will be run against it's ID.
- Cycle command are issued as
<cycle>/*. - Family commands are issued as
<family>.
Note we cannot do <family>/* because the * would be in the place of the job (i.e. we are globbing all jobs belonging to all tasks belonging to the family).
Using the workflow a => b => c as an example. If the user sees this in the GUI/Tui...
- 1
- FAM
- a (running)
- b (waiting)
- FAM
...and they run a command against either 1 or FAM, it will currently only target the task 1/a (1/b is n=1).
Problems / Questions / Conclusions
- There are valid use cases for n=0 matching.
- There are valid use cases for configuration matching.
- There are some globs we cannot expand against the config.
- The use of top-level switches or syntax changes is undesirable.
I've had a riffle through the commands and possible options:
- Hold and release: In the place of a user I'd expect to be able to hold/release any/all tasks wherever they are.
- Set: Feels like it might be sane to restrict globs to N=0 - future tasks can be skipped and broadcasts contain globs.
- Trigger: glob triggering N>0 would be madness because of the infinite cycling. However, most of the globs for cycle point I can think of amount to
cylc play --startcp <lowest cycle point matching glob> --stopcp <highest cycle point matching glob>. - Kill, Poll: I don't see these as working on inactive tasks?
- remove: I'd only expect remove to work on N=0 and past tasks
Whatever we ultimately choose, it needs to be well documented, if only so we can look up what we agreed.
Just jumping in here to make sure N=0 actually means N=0 through N. Most if not all of our tasks have retries and not being able to match the retries due to N>0 would be crippling...
Just jumping in here to make sure N=0 actually means N=0 through N. Most if not all of our tasks have retries and not being able to match the retries due to N>0 would be crippling...
@retro486 - n=0 in this context means the "active tasks" that you always see in the Cylc 8 GUI, whereas n=1 shows past and future tasks one graph-edge out from those (the default view in the GUI); n=2 two graph edges out, and so on.
This matters for task matching because (e.g.) in a cycling workflow n>0 in the future direction is potentially infinite, and in the past direction (which we sometimes call n<0) includes all past tasks back to the start of the run.
Cylc 7 had the same problem in principle, but in practice its "task pool" was sufficiently bloated (compared to n=0 in the more efficient Cylc 8) that we mostly got away with only allowing viewing and matching within the task pool.
I think you must be assuming we mean something else (submit numbers?). Retries don't change any of this. A task that fails and retries several times will stay in n=0 throughout.
@wxtim -
I've had a riffle through the commands and possible options: ... Set: Feels like it might be sane to restrict globs to N=0 - future tasks can be skipped and broadcasts contain globs. Trigger: glob triggering N>0 would be madness because of the infinite cycling. However, most of the globs for cycle point I can think of amount to
We should not support * as a future cycle glob for obvious reasons. (Or at least, not without some serious safety fencing - probably not worth the effort).
We might decide not to support any future cycle glob (e.g. [2000,2001]) because the benefit-to-danger ratio seems marginal at best.
But we DO need to support future and past name globs (by wildcard and family) - e.g. I want to trigger a future family or group without having to list every task ID. And the same goes for set as trigger.
(Otherwise agreed with your statements).
@oliver-sanders - generally agreed on all points I think; just a couple of extras to note:
Desired Use Cases (Configuration Matching)
In 1 and 2 you only mention family name globbing, and the trigger command...
3 other?
- I presume you also mean to include name pattern globbing, and the set command (prereqs, outputs)
- https://github.com/cylc/cylc-flow/issues/5677 - to show what the prerequisites and outputs are of a future task (we have a good number of user requests for this - see the issue
NOTE my table of commands per "target task population" in the Issue description above still covers everything needed (although it is good to have specific examples and syntax).
Limitations Of Configuration Matching
- Glob by cycle: The workflow might be infinite, we cannot list every possible cycle point the task could exist in.
For completeness, I'm sure we could dream up use cases that would not be dangerous (a small workflow with a final cycle point; or globbing a small range of points). But happy to simply ban cycle globbing beyond n=0.
- Optional branching
Future globbing can match tasks that would not end up running if the workflow was left to its own devices. E.g. it could pick up the "wrong" side, or even both sides, of an optional branch.
(I'm just mentioning this because it was raised as a problem somewhere back in the chain of issues leading to this one).
We will need to document this for users, but it's definitely not a reason to disallow globbing - it is (or should be) obvious that future task matching can only select tasks that can in principle run at that point in the graph. Plus the "problem" applies just as well without globbing - i.e., for targeting future tasks by explicit task IDs, which we can already do.
An idea, to mitigate any potential for users to accidentally match too many tasks:
Both pool (n=0) and configuration matching can be done synchronously in the scheduler so a list of matched tasks could be fed back to the user, to check before going ahead with the command.
- A CLI/GUI switch is undesirable.
- Task selectors should only be searched for in the n=0 window, this is fairly intuitive.
- Cycle globs could be restricted to "active" cycles only (removes the infinite glob problem), also reasonably intuitive.
- This would allow us to support namespace globs at the config level which is intuitive.
To three possible approaches to matching depending on the ID 🤦, but the result should be fairly intuitive and answer to expected use cases. More thought is required before making this a solid proposal.
To clarify what tasks a glob will match, we could implement an interface, a command or argument for the CLI and potentially a live preview for the GUI - https://github.com/cylc/cylc-flow/issues/4357
@hjoliver
I think you must be assuming we mean something else (submit numbers?). Retries don't change any of this. A task that fails and retries several times will stay in
n=0throughout.
Yes, I misunderstood the n=0 context here. Thanks for clarifying!
Worth being aware of: https://github.com/cylc/cylc-flow/issues/6816
In the meeting we discussed a possible solution that is outlined above.
We could do with some examples for discussion.
Here's an arbitrary workflow
[scheduling]
initial cycle point = 1
cycling mode = integer
runahead limit = P1
[[graph]]
P1 = """
a1 => b => a2
b[-P1] => b
"""
P2 = """
x1 => b => x2
"""
[runtime]
[[A]]
[[a1, a2]]
inherit = A
[[X]]
[[x1, x2]]
inherit = X
[[a2]] # 1/a2:failed
script = if [[ $CYLC_TASK_CYCLE_POINT -eq 1 ]]; then exit 1; fi
[[b]] # 2/b:failed
script = if [[ $CYLC_TASK_CYCLE_POINT -eq 2 ]]; then exit 1; fi
That stalls with this n=0 window:
- 1
- A
- a2:failed
- A
- 2
- b:failed
- 3
- A
- a1:waiting (runahead)
- X
- x1:waiting (runahead)
- A
(For completeness, the full n=3 window out to cycle 3)
- 1
- n=1 b:succeeded
- A
- n=2 a1:succeeded
- n=0 a2:failed
- X
- n=2 x1:succeeded
- n=2 x2:succeeded
- 2
- n=0 b:failed
- A
- n=1 a1:succeeded
- n=1 a2:waiting
- 3
- n=1 b:waiting
- A
- n=0 a1:waiting (runahead)
- n=2 a2:waiting
- X
- n=0 x1:waiting (runahead)
- n=2 x2:waiting
And here are task matching examples following this solution:
| Pattern (as relative IDs) | Current behaviour | Proposed behaviour | Notes |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(no matches) |
(no matches) |
|
Have been looking through the current task matching code in relation to this issue & group-trigger. With the changes under discussion:
- The whole cycle matching code (#6816) should be removed, if an ID is specified without a namespace, just fill in
rootand apply normal family matching rules. - The active / inactive task matching code should be combined (just do tokens in => tokens out), this allows for a single interface to fulfil all possible requirements. Downstream code can load itask objects, taskdefs, etc, as needed. This will make the ID matching code much simpler.
- We should convert IDs to tokens early (we already do this in command validation), this will allow validation to be stripped from the code (currently validation is performed in multiple places).
- Subclass
TokenstoTaskTokensso we can change the cycle/task components fromOptional[str]tostrbecause we are already validating all this up-front.
Assigned myself to do this pending agreement on the approach.
Sounds sensible, generally.
The active / inactive task matching code should be combined (just do tokens in => tokens out), ...
Do you mean just match against taskdefs initially, and later on either grab a proxy from the pool or spawn it from the taskdef, at the last minute?
I've started to knock together a POC implementation of the approach we outlined in the meeting, It's simple to do and I think it might be easier to review the approach when we have something tactile we can play with. Will dump the branch here for evaluation when it's ready.
I've got something ready for evaluation:
https://github.com/oliver-sanders/cylc-flow/pull/new/5827
This should implement the proposed solution:
- Task selectors (e.g.
:failed) should only be searched for in the n=0 window. - Cycle globs could be restricted to "active" cycles only (e.g.
*will expand to all cycles which contain one or more active tasks). - Namespace globs should expand as per the config (e.g.
rootshould return all tasks).
The solution is hacked in, but should work fine for all task matching commands.
(note I have turned on inactive matching for all commands so matching will work the same for all commands on this branch)
I have now expanded the POC into a PR: https://github.com/cylc/cylc-flow/pull/6920/
Please use this to review the task matching approach (not the earlier POC posted above).
Initially, I was a bit sceptical about the "* should expand to all active cycles" bit, but after testing, I have come around to it. I think this should satisfy all use cases and feels nicely intuitive. I also can't think of any better solution! (well done whoever it was who came up with this idea in the meeting!)
The idea of adding a command / interface that shows you what tasks a pattern would match (https://github.com/cylc/cylc-flow/issues/4357) is a good one (especially as a learning tool). E.g, we could display the list of tasks a command would match live whilst the user types in the GUI! I have designed the new interface to be compatible with this goal, however, I haven't added the functionality in the PR as it requires work to Cylc UI Server (https://github.com/cylc/cylc-uiserver/issues/720).