cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

Isolate ICP and FCP graphs using the runahead limit

Open hjoliver opened this issue 2 years ago • 7 comments

A bit of late night fun (sad!).

Implement "start-up and shutdown graphs" (effectively isolated from the main graph so that perpetual dependence on R1 & $ tasks can be avoided) ... by manipulating the runahead limit.

These changes close #4912

Imagine a task prep is supposed to prepare the run directory for all tasks; and clean is supposed to tidy up at the end after all tasks have finished:

[scheduling]
    cycling mode = integer
    final cycle point = 5
    isolate initial cycle point = True
    isolate final cycle point = True
    [[graph]]
        R1 = "prep => foo"
        P1 = "foo"
        R1/$ = "foo => clean"
[runtime]
    [[prep, clean, foo]]
        script = sleep 10

With all tasks having the same run length, on master (without the new config items) 1/prep runs at the same time as 2/foo, 3/foo, 4/foo, 5/foo, then 1/foo runs at the same time as 5/clean. Which is clearly not the intention, so we'd need additional ugly dependencies to make it work.

Or ... on this PR branch, 1/prep => 1/foo runs to completion, then 2/foo, 3/foo, 4/foo run to completion, then 5/foo => 5/clean runs. :boom:

Requirements check-list

  • [x] I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • [x] Contains logically grouped changes (else tidy your branch by rebase).
  • [x] Does not contain off-topic changes (use other PRs for other changes).
  • [x] Applied any dependency changes to both setup.cfg and conda-environment.yml.
  • [x] Appropriate tests are included (unit and/or functional).
  • [x] Appropriate change log entry included.
  • [x] (master branch) I have opened a documentation Issue at https://github.com/cylc/cylc-doc/issues/511

hjoliver avatar Aug 02 '22 11:08 hjoliver

Assigning you as first reviewer @oliver-sanders, since it is was your idea to get "start-up graphs" cheaply and safely by manipulating the runahead limit (and a fine idea it was).

hjoliver avatar Aug 03 '22 06:08 hjoliver

    [[graph]]
        R1 = "prep => foo"
        P1 = "foo"
        R1/$ = "foo => clean"

Not sure I like this. One of the key features of Cylc is the support for parallel cycles. In this example, if foo is a very long task (or series of tasks) and you want to run lots of cycles in parallel, it's going to be very frustrating if you have to spend a long waiting for the first cycle to complete before the workflow can start running cycles in parallel as intended.

One way around this is to make the start and end tasks run on separate cycles:

    [[graph]]
        R1 = "prep"
        P1!(^,$) = "foo" # or R/+P1/P1!$
        R1/$ = "clean"

(syntax provided by @oliver-sanders !) However, the syntax is horrible and you also have to adjust the start and stop points.

It would be much nicer if we could do something like:

    [[graph]]
        start = "prep"
        P1 = "foo"
        finish = "clean"

I guess that's a lot more work?

dpmatthews avatar Aug 04 '22 18:08 dpmatthews

@dpmatthews -

It would be much nicer if we could do something like:

Haha, that was my original suggestion, albeit explicitly restricted to start and finish graphs - you might have chimed in back on #4903!!

I guess that's a lot more work?

Well, I can think about it, if you and @oliver-sanders agree - he didn't like the separate graphs approach, which is how we ended up here. But maybe the restriction to "start" and "finish" graphs would make that acceptable - what do you think Oliver?

I think the runahead limit implementation is OK. It's opt-in, we can explain the potential downside and that prep[^] => foo) is an alternative. And perhaps worth noting that in a workflow with many cycles, less than optimal scheduling in the first won't matter much.

However, the syntax is horrible and you also have to adjust the start and stop points.

Cool that our syntax support that though. But inter-cycle triggers would make that even worse (the offset would break pre-initial ignore).

hjoliver avatar Aug 05 '22 03:08 hjoliver

Long story short, I also think this is better:

    [[graph]]
        start = "prep"
        P1 = "foo"
        finish = "clean"

If you look at the description of #4903 that's what I was aiming at, but maybe I shot myself in the foot there by trying to generalize it!

hjoliver avatar Aug 05 '22 03:08 hjoliver

you might have chimed in back on https://github.com/cylc/cylc-flow/issues/4903!!

Sorry!

But inter-cycle triggers would make that even worse (the offset would break pre-initial ignore).

Yuck - hadn't thought of that.

dpmatthews avatar Aug 05 '22 06:08 dpmatthews

@oliver-sanders agree - he didn't like the separate graphs approach

I'm ok with separate recurrences for the ICP/FCP. I think a fully abstracted multi-graph solution would require a lot more thought than we can reasonably provide at this point in time.

Long term it might also be the case that workflow modules / composition (e.g. the advanced handling of sub-workflows) may provide a simpler solution to this sort of problem.

Long story short, I also think this is better:

[[graph]] start = "prep" P1 = "foo" finish = "clean"

It's definitely better, the issue is trying to work out how to implement it! The "start" and "finish" cycles need to be assigned a "cycle point" but what should that be? We could do something like subtract PT1S from the ICP at configuration time, but that's a bit icky (and won't necessarily work with cycle point format). Or we could implement start and finish as special cycle points, but that requires patching all cycling interfaces to handle these special keywords.

oliver-sanders avatar Aug 05 '22 08:08 oliver-sanders

Or we could implement start and finish as special cycle points, but that requires patching all cycling interfaces to handle these special keywords.

I was thinking along those lines, but I haven't thought through it in depth yet.

hjoliver avatar Aug 05 '22 21:08 hjoliver

Superseded by #5090

hjoliver avatar Sep 14 '22 07:09 hjoliver