cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

Command task matching requirements

Open hjoliver opened this issue 2 years ago • 8 comments

Supersedes: #5763, #5695

See also: #5752, #5416, #5677

TODO:

  • [ ] hold
  • [ ] set
  • [ ] trigger
  • [ ] release
  • [ ] kill
  • [ ] poll
  • [ ] remove
  • show
    • [ ] show command
    • [ ] Update the troubleshooting page to remove the note about cylc show not working outside the n=0 window.

target task population

Different commands need to select tasks from different populations:

command population
hold, set, trigger pool, future, past
release held-tasks list only #5752
kill, poll pool only
remove pool, past
show pool only (but maybe future, past too #5677)

task name wildcards

We need to support matching task names by glob (or regex) and family name in all cases (not just in the pool.

For future and past tasks this means:

  • find all matching task def names (or member tasks of matching family names)
  • check which are valid for the given cycle point(s)

cycle point wildcards

We need to support matching cycle points by glob in the pool, e.g. to target all incomplete tasks.

We should not support cycle point wildcards outside of the pool (well, * is certainly bad; [5-8] (e.g.) may be OK in principle but still dangerous and not that much of a plus to users anyway).

task qualifiers

Use of some task qualifiers (i.e. :output) necessarily restricts matching to the pool. E.g. cylc set --out=succeeded "*:failed" should set all n=0 incomplete failed tasks to succeeded.

We can probably decide that use of any qualifier means pool only (is there any need to distinguish between e.g. completed-succeeded and completed-failed past tasks?)


Terminology:

  • pool: task proxies in the n=0 active window
  • future, past: abstract tasks anywhere else in the graph, i.e. n>0 and n<0

hjoliver avatar Nov 18 '23 21:11 hjoliver

Update to the above, regarding cycle point wildcards:

  • it would not be OK to allow cylc trigger */* in an infinite workflow - that would cause a meltdown
  • but it would be OK to allow cylc hold */* - that just means "hold any task that gets spawned" which is perfectly safe

So that's another difference in task-matching requirements for different commands.

hjoliver avatar Jun 18 '24 03:06 hjoliver

task name wildcards

We need to support matching task names by glob (or regex) and family name in all cases (not just in the pool. For future and past tasks this means:

  • find all matching task def names (or member tasks of matching family names)
  • check which are valid for the given cycle point(s)

Note we already have the machinery to determine if given tasks (by name) are valid at a particular cycle point.

There is still potentially an issue as to whether the task would actually end up running there automatically due to optional branching (or even manual interventions) upstream of it.

However, I don't think that really matters, because:

  • if you manually set or trigger such a task, you are saying do it now regardless of that
  • if you (say) hold such a task, that essentially says "hold it IF it gets spawned in the future". If it doesn't get spawned we're just left with a small housekeeping problem in the hold list.

Finally, for task names, this "problem" (if it is one) applies equally to individual future tasks - which we already allow. For a family name or glob, we just end up with more tasks at the target point.

hjoliver avatar Jul 01 '24 23:07 hjoliver

Agreed this is needed .. And if it is to work using a broadcast style/approach, without reading context (which may address this), we should think about using a pop-up window in the UI to show all current broadcasts/actions. This means we/users can just click an x on whatever items/future-actions we want to stop/undo (i.e. those that don't or haven't decay(ed) naturally with task completion).

dwsutherland avatar Oct 13 '24 08:10 dwsutherland

Recommend tackling this in combination with the use cases from https://github.com/cylc/cylc-flow/issues/5416

oliver-sanders avatar Jun 11 '25 12:06 oliver-sanders

The trigger part of this is now a very high priority needed to fully exploit group-trigger, tagged against the 8.6.0 milestone for that reason.

oliver-sanders avatar Jun 19 '25 14:06 oliver-sanders

Task-like IDs can be used with the operations this issue is concerned with, e.g. trigger, hold and kill. Note, task-like as opposed to workflow-like.

Syntax

Syntax:

cycle[:selector][/namespace[:selector][/job[:selector]]]

Examples:

2000/foo/01:failed
2000/foo:failed
2000:failed
2000/foo/01
2000/foo
2000

Glob support:

*       # Matches everything.
?       # Matches any single character.
[seq]   # Matches any character in "seq".
[!seq]  # Matches any character not in "seq".

Examples:

2000/*                  # 2000/a, 2000/b, 2000/c, ...
2000/task_prefix_*      # 2000/task_prefix_1, 2000/task_prefix_2, 2000/task_prefix_3, ...
2000/task_??_start      # 2000/task_01_start, 2000/task_02_start, 2000/task_03_start, ...
2000/task_0[01]_start   # 2000/task_01_start, 2000/task_02_start
2000/task_[!1]*         # 2000/task_02_start, 2000/task_03_start, 2000/task_04_start, ...

Globs

At present, these IDs are matched against the task pool (i.e. the workflow's n=0 window).

Families are supported, they effectively get expanded into a list of the tasks which is then matched against the task pool.

As such we currently consider families as equivalent to globs (basically an implicit glob).

# select all members of FAM (in all cycles) that are also n=0
*/FAM

# select all tasks in the cycle 2000
2000  # this is shorthand for 2000/root
2000/root
2000/*  # also equivalent

Existing Glob Use Cases (n=0 task matching)

  1. Trigger all [submit-]failed tasks:
*:[submit-]failed
*/*:[submit-]failed
  1. Orphan active jobs (e.g. platform dead, need to tell Cylc they've failed so we can resubmit to another platform):
# if all tasks in workflow are on the same platform
*:submitted
*:failed

# otherwise can select by family, e.g. HPC
FAM:submitted
FAM:running
  1. Remove a subgraph before it runs:
# note: only the n=0 tasks need to be removed
2000/
2000/root
2000/*
  1. Trigger n=0 tasks / satisfy their prerequisites:
2020/FAM  # note this may be equivalent to the family's "group start tasks" in many cases
task_prefix_*
  1. Other?

Desired Use Cases (Configuration Matching)

For some use cases, we would like to be able to perform pattern matching beyond the n=0 window. I.E, FAM would expand to all tasks within the family, even if they are not currently present in the n=0 window.

  1. Group trigger
2000/     # run a whole cycle
2000/FAM  # run a whole family
  1. Hold a group
2000/
2000/FAM
  1. Other?

Limitations Of Configuration Matching

  1. Select by state:

    It only makes sense to filter by state against n=0 tasks. E.G, cylc trigger '*:failed - we don't want to trigger all failed tasks going back to the start of the workflow, only the ones that are presently in the n=0 window, i.e. final-incomplete tasks.

    *:failed
    2000/*:failed
    2000/FAM:failed
    
  2. Glob by cycle:

    The workflow might be infinite, we cannot list every possible cycle point the task could exist in.

    */FAM
    

Task Selection In The GUI / Tui

If you run a command against a cycle/family (i.e. click in GUI, press enter in Tui), the command will be run against it's ID.

  • Cycle command are issued as <cycle>/*.
  • Family commands are issued as <family>.

Note we cannot do <family>/* because the * would be in the place of the job (i.e. we are globbing all jobs belonging to all tasks belonging to the family).

Using the workflow a => b => c as an example. If the user sees this in the GUI/Tui...

  • 1
    • FAM
      • a (running)
      • b (waiting)

...and they run a command against either 1 or FAM, it will currently only target the task 1/a (1/b is n=1).

Problems / Questions / Conclusions

  1. There are valid use cases for n=0 matching.
  2. There are valid use cases for configuration matching.
  3. There are some globs we cannot expand against the config.
  4. The use of top-level switches or syntax changes is undesirable.

oliver-sanders avatar Jun 20 '25 15:06 oliver-sanders

I've had a riffle through the commands and possible options:

  • Hold and release: In the place of a user I'd expect to be able to hold/release any/all tasks wherever they are.
  • Set: Feels like it might be sane to restrict globs to N=0 - future tasks can be skipped and broadcasts contain globs.
  • Trigger: glob triggering N>0 would be madness because of the infinite cycling. However, most of the globs for cycle point I can think of amount to cylc play --startcp <lowest cycle point matching glob> --stopcp <highest cycle point matching glob>.
  • Kill, Poll: I don't see these as working on inactive tasks?
  • remove: I'd only expect remove to work on N=0 and past tasks

Whatever we ultimately choose, it needs to be well documented, if only so we can look up what we agreed.

wxtim avatar Jun 23 '25 10:06 wxtim

Just jumping in here to make sure N=0 actually means N=0 through N. Most if not all of our tasks have retries and not being able to match the retries due to N>0 would be crippling...

retro486 avatar Jun 23 '25 14:06 retro486

Just jumping in here to make sure N=0 actually means N=0 through N. Most if not all of our tasks have retries and not being able to match the retries due to N>0 would be crippling...

@retro486 - n=0 in this context means the "active tasks" that you always see in the Cylc 8 GUI, whereas n=1 shows past and future tasks one graph-edge out from those (the default view in the GUI); n=2 two graph edges out, and so on.

This matters for task matching because (e.g.) in a cycling workflow n>0 in the future direction is potentially infinite, and in the past direction (which we sometimes call n<0) includes all past tasks back to the start of the run.

Cylc 7 had the same problem in principle, but in practice its "task pool" was sufficiently bloated (compared to n=0 in the more efficient Cylc 8) that we mostly got away with only allowing viewing and matching within the task pool.

I think you must be assuming we mean something else (submit numbers?). Retries don't change any of this. A task that fails and retries several times will stay in n=0 throughout.

hjoliver avatar Jun 24 '25 05:06 hjoliver

@wxtim -

I've had a riffle through the commands and possible options: ... Set: Feels like it might be sane to restrict globs to N=0 - future tasks can be skipped and broadcasts contain globs. Trigger: glob triggering N>0 would be madness because of the infinite cycling. However, most of the globs for cycle point I can think of amount to

We should not support * as a future cycle glob for obvious reasons. (Or at least, not without some serious safety fencing - probably not worth the effort).

We might decide not to support any future cycle glob (e.g. [2000,2001]) because the benefit-to-danger ratio seems marginal at best.

But we DO need to support future and past name globs (by wildcard and family) - e.g. I want to trigger a future family or group without having to list every task ID. And the same goes for set as trigger.

(Otherwise agreed with your statements).

hjoliver avatar Jun 24 '25 05:06 hjoliver

@oliver-sanders - generally agreed on all points I think; just a couple of extras to note:

Desired Use Cases (Configuration Matching)

In 1 and 2 you only mention family name globbing, and the trigger command...

3 other?

  • I presume you also mean to include name pattern globbing, and the set command (prereqs, outputs)
  • https://github.com/cylc/cylc-flow/issues/5677 - to show what the prerequisites and outputs are of a future task (we have a good number of user requests for this - see the issue

NOTE my table of commands per "target task population" in the Issue description above still covers everything needed (although it is good to have specific examples and syntax).

Limitations Of Configuration Matching

  1. Glob by cycle: The workflow might be infinite, we cannot list every possible cycle point the task could exist in.

For completeness, I'm sure we could dream up use cases that would not be dangerous (a small workflow with a final cycle point; or globbing a small range of points). But happy to simply ban cycle globbing beyond n=0.

  1. Optional branching

Future globbing can match tasks that would not end up running if the workflow was left to its own devices. E.g. it could pick up the "wrong" side, or even both sides, of an optional branch.

(I'm just mentioning this because it was raised as a problem somewhere back in the chain of issues leading to this one).

We will need to document this for users, but it's definitely not a reason to disallow globbing - it is (or should be) obvious that future task matching can only select tasks that can in principle run at that point in the graph. Plus the "problem" applies just as well without globbing - i.e., for targeting future tasks by explicit task IDs, which we can already do.

hjoliver avatar Jun 24 '25 05:06 hjoliver

An idea, to mitigate any potential for users to accidentally match too many tasks:

Both pool (n=0) and configuration matching can be done synchronously in the scheduler so a list of matched tasks could be fed back to the user, to check before going ahead with the command.

hjoliver avatar Jun 24 '25 05:06 hjoliver

[meeting 2025-06-14]:

  • A CLI/GUI switch is undesirable.
  • Task selectors should only be searched for in the n=0 window, this is fairly intuitive.
  • Cycle globs could be restricted to "active" cycles only (removes the infinite glob problem), also reasonably intuitive.
  • This would allow us to support namespace globs at the config level which is intuitive.

To three possible approaches to matching depending on the ID 🤦, but the result should be fairly intuitive and answer to expected use cases. More thought is required before making this a solid proposal.

To clarify what tasks a glob will match, we could implement an interface, a command or argument for the CLI and potentially a live preview for the GUI - https://github.com/cylc/cylc-flow/issues/4357

oliver-sanders avatar Jun 24 '25 12:06 oliver-sanders

@hjoliver

I think you must be assuming we mean something else (submit numbers?). Retries don't change any of this. A task that fails and retries several times will stay in n=0 throughout.

Yes, I misunderstood the n=0 context here. Thanks for clarifying!

retro486 avatar Jun 24 '25 15:06 retro486

Worth being aware of: https://github.com/cylc/cylc-flow/issues/6816

oliver-sanders avatar Jun 25 '25 08:06 oliver-sanders

In the meeting we discussed a possible solution that is outlined above.

We could do with some examples for discussion.

Here's an arbitrary workflow
[scheduling]
  initial cycle point = 1
  cycling mode = integer
  runahead limit = P1
  [[graph]]
    P1 = """
      a1 => b => a2
      b[-P1] => b
    """
    P2 = """
      x1 => b => x2
    """


[runtime]
  [[A]]
  [[a1, a2]]
    inherit = A

  [[X]]
  [[x1, x2]]
    inherit = X

  [[a2]]  # 1/a2:failed
    script = if [[ $CYLC_TASK_CYCLE_POINT -eq 1 ]]; then exit 1; fi
  [[b]]  # 2/b:failed
    script = if [[ $CYLC_TASK_CYCLE_POINT -eq 2 ]]; then exit 1; fi

That stalls with this n=0 window:

  • 1
    • A
      • a2:failed
  • 2
    • b:failed
  • 3
    • A
      • a1:waiting (runahead)
    • X
      • x1:waiting (runahead)
(For completeness, the full n=3 window out to cycle 3)
  • 1
    • n=1 b:succeeded
    • A
      • n=2 a1:succeeded
      • n=0 a2:failed
    • X
      • n=2 x1:succeeded
      • n=2 x2:succeeded
  • 2
    • n=0 b:failed
    • A
      • n=1 a1:succeeded
      • n=1 a2:waiting
  • 3
    • n=1 b:waiting
    • A
      • n=0 a1:waiting (runahead)
      • n=2 a2:waiting
    • X
      • n=0 x1:waiting (runahead)
      • n=2 x2:waiting

And here are task matching examples following this solution:

Pattern (as relative IDs) Current behaviour Proposed behaviour Notes

*

*/*

[123]

[123]/*

  • 1/a2
  • 2/b
  • 3/a1
  • 3/x1
  • 1/a1
  • 1/a2
  • 1/b
  • 1/x1
  • 1/x2
  • 2/a1
  • 2/a2
  • 2/b
  • 3/a1
  • 3/a2
  • 3/b
  • 3/x1
  • 3/x2
  • cycle is shorthand for cycle/root.
  • //[123] is shorthand for //1 //2 //3.
  • Active cycles are 1, 2 and 3 so * is eqiv to [123]

*/A

*/A*

*/a*

  • 1/a2
  • 3/a1
  • 1/a1
  • 1/a2
  • 2/a1
  • 2/a2
  • 3/a1
  • 3/a2
  • Active cycles are 1, 2 and 3 so * is eqiv to [123]

*/a1

  • 3/a1
  • 1/a1
  • 2/a2
  • 3/a1
  • Active cycles are 1, 2 and 3 so * is eqiv to [123]

*/X

*/x*

[123]/X

  • 3/x1
  • 1/x1
  • 1/x2
  • 3/x1
  • 3/x2
  • x1 and x2 are not on sequence for cycle 2.
  • Active cycles are 1, 2 and 3 so * is eqiv to [123]

1/a1

  • 1/a1
  • 1/a1

1/A

1/a*

  • 1/a2
  • 1/a1
  • 1/a2

*:failed

*/root:failed

[123]:failed

[123]/root:failed

  • 1/a2
  • 2/b
  • 1/a2
  • 2/b
  • When task selectors are used, the pattern applies to n=0 only.
  • * is still eqiv to [123].
  • But only n=0 tasks are matched.

1/*:failed

1/A:failed

1/a*:failed

  • 1/a2
  • 1/a2
  • When task selectors are used, the pattern applies to n=0 only.

*:succeeded

*/*:succeeded

(no matches)

(no matches)

  • When task selectors are used, the pattern applies to n=0 only.
  • :succeeded tasks can only be n=0 if incomplete.

oliver-sanders avatar Jun 25 '25 13:06 oliver-sanders

Have been looking through the current task matching code in relation to this issue & group-trigger. With the changes under discussion:

  • The whole cycle matching code (#6816) should be removed, if an ID is specified without a namespace, just fill in root and apply normal family matching rules.
  • The active / inactive task matching code should be combined (just do tokens in => tokens out), this allows for a single interface to fulfil all possible requirements. Downstream code can load itask objects, taskdefs, etc, as needed. This will make the ID matching code much simpler.
  • We should convert IDs to tokens early (we already do this in command validation), this will allow validation to be stripped from the code (currently validation is performed in multiple places).
  • Subclass Tokens to TaskTokens so we can change the cycle/task components from Optional[str] to str because we are already validating all this up-front.

Assigned myself to do this pending agreement on the approach.

oliver-sanders avatar Jul 22 '25 11:07 oliver-sanders

Sounds sensible, generally.

The active / inactive task matching code should be combined (just do tokens in => tokens out), ...

Do you mean just match against taskdefs initially, and later on either grab a proxy from the pool or spawn it from the taskdef, at the last minute?

hjoliver avatar Jul 23 '25 05:07 hjoliver

I've started to knock together a POC implementation of the approach we outlined in the meeting, It's simple to do and I think it might be easier to review the approach when we have something tactile we can play with. Will dump the branch here for evaluation when it's ready.

oliver-sanders avatar Jul 25 '25 15:07 oliver-sanders

I've got something ready for evaluation:

https://github.com/oliver-sanders/cylc-flow/pull/new/5827

This should implement the proposed solution:

  • Task selectors (e.g. :failed) should only be searched for in the n=0 window.
  • Cycle globs could be restricted to "active" cycles only (e.g. * will expand to all cycles which contain one or more active tasks).
  • Namespace globs should expand as per the config (e.g. root should return all tasks).

The solution is hacked in, but should work fine for all task matching commands.

(note I have turned on inactive matching for all commands so matching will work the same for all commands on this branch)

oliver-sanders avatar Jul 29 '25 11:07 oliver-sanders

I have now expanded the POC into a PR: https://github.com/cylc/cylc-flow/pull/6920/

Please use this to review the task matching approach (not the earlier POC posted above).

Initially, I was a bit sceptical about the "* should expand to all active cycles" bit, but after testing, I have come around to it. I think this should satisfy all use cases and feels nicely intuitive. I also can't think of any better solution! (well done whoever it was who came up with this idea in the meeting!)

The idea of adding a command / interface that shows you what tasks a pattern would match (https://github.com/cylc/cylc-flow/issues/4357) is a good one (especially as a learning tool). E.g, we could display the list of tasks a command would match live whilst the user types in the GUI! I have designed the new interface to be compatible with this goal, however, I haven't added the functionality in the PR as it requires work to Cylc UI Server (https://github.com/cylc/cylc-uiserver/issues/720).

oliver-sanders avatar Aug 14 '25 10:08 oliver-sanders