cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

example: extending workflow

Open oliver-sanders opened this issue 1 year ago • 7 comments

Document the preferred approach to "extending" a workflow.

Closes: https://github.com/cylc/cylc-doc/issues/525 Partially addresses: https://github.com/cylc/cylc-flow/issues/5875

Context:

A common pattern in some areas is to run a workflow until it stops. Then modify the final cycle point to extend the workflow and restart.

This isn't an easy working pattern to port to Cylc 8 due to the difficulty of re-populating the start tasks. We have a plan to make this easier, see https://github.com/cylc/cylc-flow/issues/5416, however, this working pattern is needlessly complex in the first place. It would be much simpler if the workflow didn't "finish" in the first place.

To do this, swap final cycle point for stop after cycle point, now the workflow will not "finish" so the pool will not need re-polulating for the workflow to continue. This is much nicer for Cylc (no discontinuity), the user (no extra commands) and provenance (direct continuation of previous run).

Blockers.

  • [x] https://github.com/cylc/cylc-flow/issues/5939 This is a blocker to real world usage (but not to this example) where the final cycle point is typically specified as an offset from the initial cycle point.
  • [x] https://github.com/cylc/cylc-flow/issues/5946 ~This causes the "complex" example to stall.~ My misinterpretation of the pre-initial condition, can work around.
  • [ ] https://github.com/cylc/cylc-flow/issues/5945 ~This causes the final cycle point sequences to be silently skipped in the "complex" example.~ This is semi-intended and can be worked around but will be an inevitable bugbear for anyone trying to do this sort of thing.
  • [ ] https://github.com/cylc/cylc-flow/issues/5952 This causes an invalid stall on restart.

Check List

  • [x] I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • [x] Contains logically grouped changes (else tidy your branch by rebase).
  • [x] Does not contain off-topic changes (use other PRs for other changes).
  • [x] Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • [x] Tests are included (or explain why tests are not needed).
  • [x] CHANGES.md - docs only
  • [x] Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • [x] If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

oliver-sanders avatar Jan 30 '24 12:01 oliver-sanders

Agreed on documenting use of a stop point rather than final cycle point, to allow extending a workflow run more easily.

But, a stop point requires trickier graph config if any graph structure is tied to the stop point, and users may not know in advance that they will want to extend their run. So I guess we'll still have to document how to do it the hard way as well, i.e., extending the final point.

As revealed by the discussion on #5952, it will be crucial to document that changing a final cycle point, or a stop point if the graph is tied to it, can implicitly change the structure of the graph, with potential consequences for the restart.

The simple example in that issue can be handled automatically I think, but I'm not sure that's universal.

hjoliver avatar Feb 15 '24 06:02 hjoliver

So I guess we'll still have to document how to do it the hard way as well

We'd rather not, the FCP approach is conceptually awkward and are keen to move users away from it.

We've got a lot of users who rely on this pattern which was very simple at Cylc 7 but is very difficult at Cylc 8, we need to provide a simple solution for this use case.

oliver-sanders avatar Feb 15 '24 12:02 oliver-sanders

We'd rather not, the FCP approach is conceptually awkward and are keen to move users away from it.

That's all very well, but it's going to happen anyway, as in "Help, my workflow finished, how do I move the FCP and continue it",

this pattern which was very simple at Cylc 7 but is very difficult at Cylc 8,

Hmm, moving the FCP was always a bit ill-defined and dangerous, even if it did work most of the time.

It's not so difficult now: restart with --pause and trigger the first cycle after the original FCP.

hjoliver avatar Feb 16 '24 03:02 hjoliver

It's not so difficult now: restart with --pause and trigger the first cycle after the original FCP.

It really is, I find it hard! The triggering is often non-trivial and requires inspection of the workflow configuration, we cannot presently provide generic advice to do this or a single command like we used to have.

oliver-sanders avatar Feb 16 '24 10:02 oliver-sanders

We'd rather not, the FCP approach is conceptually awkward and are keen to move users away from it.

That's all very well, but it's going to happen anyway, as in "Help, my workflow finished, how do it move the FCP and continue it",

Case in point https://cylc.discourse.group/t/extending-and-restarting-a-a-workflow/911

hjoliver avatar Mar 03 '24 23:03 hjoliver

True but unhelpful, we would like to avoid using this pattern for workflows which are intended to be extended. Stop after cycle point is a very clean solution which cuts out the discontinuity between runs completely.

Will come back to this one and the associated issues when I get the chance.

oliver-sanders avatar Mar 04 '24 16:03 oliver-sanders

I don't think we're disagreeing on the fundamental point here.

I'm just saying in addition to that, we still need to document how to restart after extending the FCP if you do get yourself into that particular fix.

Recommending a better way isn't much help if users don't anticipate the need for extending a workflow, or they don't see the advice, before starting a run.

hjoliver avatar Mar 11 '24 03:03 hjoliver