Isolated graphs (startup, shutdown, ...)
A clean issue for a problem with a long and convoluted history (e.g. see #4903, #4912, #5036, and most "recently" #5090).
background
Cylc famously has no barrier between cycles. This is usually a very good thing, but there are certain situations where a hard barrier would be convenient because it stands in for ALL dependence on tasks prior to the barrier.
I'll illustrate for a simple workflow with:
- startup tasks that everything else depends on (e.g. to deploy stuff), and
- shutdown tasks that should not run until everything else has finished
how we currently achieve it (1/2)
⬆️ The blue bits represent perpetual dependence on the initial tasks, to ensure that nothing runs before they finish. This is reasonably intuitive and reflects real dependence - not a workaround - but it causes problems:
- it makes an unbelievable mess of graph visualizations for real workflows
- it makes retriggering the startup graph difficult or confusing (will it "flow on" again?)
- it can cause performance issues in the cycling computations, far from the ICP (is this still true?)
The red bits represent dependence on special dummy tasks that exist purely to ensure that final-cycle tasks wait on everything else - i.e. it's a workaround.
- it is not possible to do blue thing for the final cycle - that would require
foo => bar[$] - (however, the shutdown task requirement is much less common than startup)
how we currently achieve it (2/2)
⬆️ Pragmatically, using the workaround (red) at both ends is probably better than the proper solution (blue), and the skip task run mode makes it more attractive than it used to be, but:
- it can be unpleasant for some workflows (e.g. with lots of parentless tasks at the top of a cycle)
- evidence at ESNZ shows users don't naturally think of it (all of our workflows have the blue bits)
- (and it is still a workaround that should not be necessary)
a better solution
⬆️ For our simple example, it would be better to have 3 separate graphs that each run to completion before the next one starts, thus absolving us of the need to handle nasty perpetual dependencies.
High-level considerations
The separate graphs should naturally run in sequence (the whole point of this is to put a barrier between bits of graph in certain situations where that is actually helpful) but we need to be able to re-trigger tasks from earlier graphs, e.g. to redeploy code or data mid-run.
Also, generally we should - https://github.com/cylc/cylc-flow/pull/5090#issuecomment-2911493900
- support more than 3 separate graphs, and
- support cycling within each graph (and probably different cycling types) - e.g. for model spinup
We need to consider future directions and be compatible with that vision (so far as is possible - https://github.com/cylc/cylc-flow/pull/5090#issuecomment-3375917068 - https://github.com/cylc/cylc-flow/pull/5090#issuecomment-3376341262
Do we need to support parallel running of the different graphs, after manual retriggering? - probably NOT:
- if I retrigger tasks from earlier graphs that probably requires pausing or suspending the current graph
- e.g. if I want to redeploy code that many later tasks depend on, I probably should not continue to run those later tasks during the redeployment process
- if not, then it seems a mixed task pool is not necessary
- this removes the gnarly problem of comparing different kinds of cycle point within the task pool
- instead, temporarily swap out the task pool and restore it after the retriggered other-graph tasks have completed?
- have to wait for live tasks to complete first though?
implementation ideas and considerations
Isolate the first and last cycles by dynamically manipulating the runahead limit?
- ❌ it works but it restricts multi-cycling early in the main graph, because the isolated cycle is also the first main cycle
- (REJECTED)
Special startup and shutdown cycle points? - #5090
- ✅ works with existing UI, e.g.
cylc trigger //startup/foo - ❌ does not support cycling in the startup and shutdown graphs
- 🥹 mixed task pool works fine but requires hacking cycle computations (runahead limit and more) to handle the special points
- (works but probably too restrictive)
UI - how to identify tasks from different graphs (given that special cycle point values are insufficient if we want cycling)?
- the main graph is special, others automatically get a unique task-name prefix?
- e.g.
1/startup_foo,2/spinup_bar - works with existing UIs
- clearly visible in GUI
- e.g.
- anything else would require new special options like
--graph=spinupand have a visibility problem (how do I know which graph this task in the GUI belongs to?)?
Motivation for this feature
Comments on the points raised in the OP:
but it causes problems:
It makes an unbelievable mess of graph visualizations for real workflows
I'm not sure if this is causing issues at our end, haven't heard anything from users at least, perhaps you have more ICP dependencies in your workflows than we typically see in our workflows? Or just more resistance to dummy tasks?
I wonder if there could be other solutions to this problem, say something like the old [visualisation] section for configuring the graph?
it makes retriggering the startup graph difficult or confusing (will it "flow on" again?)
I'm not sure about this one, if "flow on" behaviour is confusing for R1 tasks, then it is confusing for all tasks. Now that the --flow option isn't needed for re-running graphs, I'm not sure how relevant this is (users are just confused by --flow in general IMO).
It can cause performance issues in the cycling computations, far from the ICP (is this still true?)
I don't think this is an issue, but haven't tested.
Here are the possible motivations I can think of:
- Can make a mess of graphs where large numbers of ICP tasks are referenced in subsequent recurrences and dummy tasks are not used to collate these dependencies (accepting your point from above).
- Dummy tasks are an awkward workaround.
- Graph configuration might be another possible workaround.
- ICP dependence has to be added everywhere, in large workflows it might be possible to miss it in one recurrence section by accident.
- No workaround at present, just "get it right".
- Task/graph-level configuration might be an alternative workaround for this (i.e, some form of auto-dependency configuration along the lines of "sequential tasks".
- Dependencies on ICP tasks are easy (e.g,
build[^] => run), however, dependencies for FCP tasks are tricky (e.g,run => clean[$]is not legal).- No real workaround at present, this just has to be brute-forced.
- Supporting inter-cycle offsets on the RHS might be an alternative solution to this.
Here's a couple of use cases which are awkward at the moment.
-
Spinup graphs.
A patten we see in a couple of workflows, something along the lines of this:
[scheduling] [[graph]] R1 = """ spinup<x, y> => spinup<x, y+1> spinup<x, y> => configure """ P1Y = """ configure[^] => model model[-P1Y] => model """The
R1section (and this is a gross simplification) is more akin to integer cycling, thexparameter can get quite large. Used to cause problems, but since Cylc 8 (SoD specifically), this hasn't caused issues as we're no longer loading the entire cycle's worth of tasks. -
Result collation
A more general example of FCP dependence issues. This is a common pattern in climate graphs where there is often no inter-cycle dependence. Cycles may run in a highly arbitrary order as each cycle starts with an archive retrieval task (like a random sleep). The runahead limit is typically cranked up high in order to get as many archive retrieval requests in as possible and make as much progress through the workflow as possible at the rate the retrieval tasks kick out data. The problem is collating the results.
[scheduling] cycling mode = 360 # solution only works with artificial calendars! [[graph]] P1D = """ retrieve => model => process """ P1M = """ process & process[-P1D] & process[-P2D] ... & process[-P30D] => collate_month """ P1Y = """ collate_month & collate_month[-P1M] & collate_month[-P2M] ... & collate_month[-P11M] => collate_year """ P10Y = """ collate_month ... => collate_decade """ R1/$ = """ collate_decade ... => report """May or may not be a case for this sort of feature?
Possibly solvable with relative cycle syntax and inter-cycles on the RHS, e.g:
process => collate_month[+R1]?
It makes an unbelievable mess of graph visualizations for real workflows
A real-life example, one of our many operational workflows that start with a bunch of "deploy" tasks in the first cycle (in fact, a relatively uncomplicated one!):
1. with initial dependencies removed - clear structure
2. with initial dependencies restored
3. zoomed in bit - it IS an unbelievable mess
perhaps you have more ICP dependencies in your workflows than we typically see in our workflows? Or just more resistance to dummy tasks?
I think both might be true. I think our operational system is more diverse and modular, and all the workflows have initial deployment tasks that result in this problem, and all (or most) research workflows are similar (most are focused on evolving the operational ones).
I agree (and I more or less said so above) that the dummy task solution is preferable to this BUT it's not particularly intuitive - users don't think of it themselves - whereas perpetual dependence on initial tasks is reasonably obvious (IF all you have to play with is one main cycling graph).
NOTE some of those ICP dependencies are probably superfluous, due to triggering from all members of family that actually has internal dependencies - but regardless, that's what users do - they typically don't think too hard about the graph structure so long as it works - and isolated graphs would relieve that cognitive burden...
it makes retriggering the startup graph difficult or confusing (will it "flow on" again?)
I'm not sure about this one, if "flow on" behaviour is confusing for R1 tasks, then it is confusing for all tasks. Now that the --flow option isn't needed for re-running graphs, I'm not sure how relevant this is (users are just confused by --flow in general IMO).
I'm not entirely sure about this one anymore. I have had a number of complaints (including from @dwsutherland - do you want to comment David?) that it is difficult, but I have not had a chance to try to replicate.
To the extent that flow-on is difficult to either understand or manage (if it still is) retriggering ICP "deployment" tasks is kind of a worst case scenario because:
- it has to be done reasonably often (whenever code needs to be redeployed into a running system)
- and by operational operators, rather than the workflow owners/experts
- it is many many cycles back in the past, which is a problem if unwanted flow-on does occur (if)
- all tasks, including otherwise parentless ones, depend on these
It can cause performance issues in the cycling computations, far from the ICP (is this still true?)
I don't think this is an issue, but haven't tested.
It certainly used to be an issue, but it was quite some time ago so maybe we fixed it somehow, not sure.
I wonder if there could be other solutions to this problem, say something like the old [visualisation] section for configuring the graph?
Maybe, but it strikes me that isolated graphs are the obvious solution and a nice fit for the typical mental model: I want to do a bunch of stuff THEN start the main cycling graph (and I'll likely need to retrigger the early stuff again mid-run).
Perpetual ICP dependence makes sense IF all you have to play with is the main cycling graph, but it would be easier (conceptually too) not have to deal with that.
Here are the possible motivations I can think of:
Yes I think I entirely agree with your motivations. (Which mostly overlap with or extend mine a bit).
Special
startupandshutdowncycle points?
Another problem with this approach is you cannot use the cycle point startup to know what date data to source of deploy like ICP (or some offset from the ICP).
Here are the possible motivations I can think of:
Yes these are all good, and another of couple:
- You can loose your ICP prerequisite satisfaction if you happen to change your ICP
- If you startup your workflow with start tasks off the ICP (because you might only want to run select startup), you need to no-flow run startup tasks so it doesn't spawn the gap in, i.e.
[scheduling]
[[graph]]
R1 = """
deploy => configure
spinup<x, y> => spinup<x, y+1>
spinup<x, y> => configure
"""
P1Y = """
configure[^] => model
model[-P1Y] => model
"""
If the ICP is 1973, but you do cylc play --pause --start-task=1999/model .. Followed by cylc trigger --flow=none wflow//1973/deploy and then cylc trigger --flow=none wflow//1973/deploy
(because perhaps you want to avoid spin up, and don't want to flow into the )
Then dependencies are ignored prior to the start tasks..
I'm not sure about this one, if "flow on" behaviour is confusing
I'm not entirely sure about this one anymore. I have had a number of complaints
However if you restart the workflow the dependency configure[^] will become unsatisfied.. And then if you set/run the corresponding ICP task, model will spawn into the 1973-1998 gap..
In fact if you just forget to --flow=none then you'll have this problem when re-running startup tasks..
Even though this might be a bug in losing the ignore-dependencies-before-start-task feature on restart (and reload?), this whole scenario/complication can just be avoided by having a separate spinup/startup graph section..
dummy_task won't help with this one...
Possibly solvable with relative cycle syntax and inter-cycles on the RHS
Think we talked about extending the ^ syntax to < and > for previous and next instance respectively.
It can cause performance issues in the cycling computations, far from the ICP (is this still true?)
I think it is still and issue, reloads still take an age in some cases (if this is related)
Our modular operation means we have a lot of parentless tasks waiting on xtriggers, and it doesn't always make sense to tie them all to dummy tasks or have some sort of pseudo dependency between cycles just to avoid running before/while startup tasks..
And we haven't even talked about tying FCP/shutdown tasks to the completion of everything that went before..
Separate prioritised graph sections will introduce a massive and welcome simplification, and a feature that can't truly be replicated in any other way...
And on the implementation front something like this, and as mentioned in Element, the validation enforcement of making sure graph sections don't contain the same named task (to avoid inter section deps and ensure sections aren't rerun at the same time) would be a good start..
You can loose your dependence on ICP tasks if you happen to change your ICP
I don't understand this one, do you have an example?
If ICP deps are behaving differently to regular inter-cycle deps, that's a bug.
If you startup your workflow with start tasks off the ICP (because you might only want to run select startup), you need to no-flow run startup tasks so it doesn't spawn the gap in, i.e.
I'm not sure about this one, is that a valid expectation of --flow=none?
I don't understand this one, do you have an example?
Sorry, fixed, ICP prerequisite satisfaction
Sorry, fixed, ICP prerequisite satisfaction
Hmm, still don't think it should matter whether an ICP dep is used here. If there's a difference, raise a bug report.
Point is we wouldn't have to care about the ICP dep as much didn't use it in most cases, even if a bug fix is applied, although it might be argued that this is expected behavior
I don't understand how the presence of the ICP dep is hindering you here?
Note that isolated graphs implementations essentially replace dependency on a task at the ICP with a dependency on the ICP itself, so may or may not actually make a difference here (implementation depending).
I don't understand how the presence of the ICP dep is hindering you here?
Essentially it's a complication that doesn't need to be there, and any resulting behavior/quirks we wouldn't have to deal with.
Note that isolated graphs implementations essentially replace dependency on a task at the ICP with a dependency on the ICP itself
Well it's a dependency on the graph section not being active (which might be ICP or w/e).. It avoids task dependencies, which is the desirable part.
I would like to move forward with this, ASAP, as it would massively simplify workflows at ESNZ (and the DR sync startup)..
Justification
Are we in agreement that this isolated/bespoke graph idea/proposal solves:
- The need for ICP dependency workarounds of
installed[^]andinstalled_dummy. Including: -- Their inclusion everywhere in a graph, especially graphs of many isolates (such as with the ESNZ modular workflows) --installed_dummyprevious cycle dependency complications (i.e. the mixing ofT01, T05, T11,T03, T07, T19, andP1M) --installed[^]complications of spawning in newly added tasks into the distant past on rerun completion. - FCP equivalent (and anyone other exotic parallel).
- The unnecessary clutter added to the graph (and other side effects, i.e. expanding of the graph window always expanding to ICP tasks for
installed[^]). - The need to manually hold downstream sections of graph while rerunning startup/upstream sections.
- Performance (?) and/or plain efficiency (from less tasks and edges at minimum).
- In general, replaces task dependency with graph dependency to more accurately represent the designers intention (i.e. I want all of this to run before/after everything else).
And that we are justified in moving forward to implementation?
Requirements
The design needs to accommodate both minimum requirements now and a future expansion of this feature.
Now:
- Bespoke graph sections
- Linear inter-graph dependencies
- Run/Rerun priority based on the inter-graph dependency
- Cycling in each section, with ICP/FCP and mode inherited
Future:
- Non-linear inter-graph dependencies ("Parameterization and DAGs" in line with task dependency syntax as Oliver outlined)
- Bespoke ICP/FCP and cycling mode
- Graph section info/data for UI visualisation.
The reason for this choice in split of requirements is because we can easily achieve the "Now" requirements in a way that delivers most the benefits of the feature with the least impact on the backend. And it can be achieved with minimal/no breaking changes.
Implementation
Similar to this, we want something that essentially expands the scheduling section.
We have:
[scheduling]
[[graph]]
P1D = "foo => bar"
Now implementation would look like:
[scheduling]
R1 = "deploy => graph => final"
[[deploy]]
R1 = "a => b"
[[graph]]
P1D = "foo => bar"
[[final]]
R1/$ = "x => y"
And this could facilitate the Future, i.e.:
[scheduling]
R1 = "deploy => spinup => graph & graph2 => final"
[[deploy]]
R1 = "a => b"
[[spinup]]
cycling mode = integer
initial cycle point = 1
final cycle point = {{ SPINUP_LENGTH }}
R1 = spinup<model>_cold => spinup<model>
P1 = spinup<model> => spinup<model + 1>
[[graph]]
P1D = "foo => bar"
[[graph2]]
cycling mode = integer
initial cycle point = {{ SPINUP_LENGTH }}
final cycle point = {{ DONE_LENGTH }}
P1 = "baz<model> => tov<model>"
[[final]]
R1/$ = "x => y"
I don't think we need to consider visualisation at present, however it would be easy to imagine the graph sections as families with states determined in the same way (which could also be extended to inter-graph triggering if we end up there)..
To start with we would need several rules enforced at validation, including:
- No same named task is allowed in multiple graph sections.
- No task dependencies across graph sections (sort of implied by the first rule).
- Only linear inter-graph dependencies
These are straight forward rules that simplify implementation, and whose violation is in opposition to the feature's definition and purpose.
(i.e. How can you have "deploy => graph => final" inter-graph dependency if the same task exists in both graph and final, or something in graph depends on something in final?)
The first also makes it easy to identify which graph section is active.
With inherited ICP/FCP and cycling mode (to start with), we can get away with using the same task pool. However, with cycling in all sections we will need a separate runahead for each.. And the crux of the work will be in prioritizing/determining what can and can't run: https://github.com/cylc/cylc-flow/blob/54f0d08dc5f47e68c5cfbbcce31675eaeec00127/cylc/flow/scheduler.py#L1622-L1646
In the config and pool you'd have some kind of priority structures (something like what's described here).. Tasks in the inactive downstream sections can just stay queued, perhaps.
The internal implementation is fairly simple for the Now requirements, but can obviously change/improve in the Future..
@oliver-sanders, @hjoliver - Thoughts? If we can agree on this basic implementation, then let's get the ball rolling
@oliver-sanders, @hjoliver - Thoughts? If we can agree on this basic implementation, then let's get the ball rolling
This is going to require more thought before work can begin.
The bad news is that we're really blocked out at our end and will be until ~ the end of January. Sorry about that.
I'll quickly try and issue some responses to help...
Are we in agreement [...]
And that we are justified in moving forward to implementation?
Cards on the table, ..., I'm not sure.
The need for ICP dependency workarounds of installed[^] and installed_dummy.
The unnecessary clutter added to the graph (and other side effects, i.e. expanding of the graph window always expanding to ICP tasks for installed[^]).
Most of the concerns I've been hearing are about the graphing which seems to be the main driver here.
Worth pointing out that we could change the graphing behaviour for absolute-ICP dependencies to achieve your desired behaviour without having to actually isolate these graph sections or make any other fundamental change to Cylc.
It should be pretty easy and a small change to replace inter-cycle edges with intra-cycle ones when we see a [^] dependency. This would be kinda similar to how we displayed xtriggers in the Cylc 7 graph view. This sort of approach would prevent the cycles from being linked together in the graph, whilst maintaining the visibility of the R1 tasks.
There are definitely simpler ways to achieve the graphing side of this is speed is the object.
The need to manually hold downstream sections of graph while rerunning startup/upstream sections.
As of Cylc 8.5 (8.6 for family triggers), you can just trigger those R1 tasks to re-run them, e.g. cylc trigger //^/INSTALL_COLD. Cylc will not "flow on" unless you specify --flow=new (it looks in the DB and finds that the task has already run).
Performance (?) and/or plain efficiency (from less tasks and edges at minimum).
I don't think R1 tasks are an appreciable efficiency problem unless there are issues I'm not aware of?
Cards on the table, ..., I'm not sure.
I'm actually very excited about this change, because it would make our life so much easier (especially given the modular and collaborative setup at ESNZ)... I haven't seen any reason why it's not desirable/justified, and laid out a clear justification for... Maybe because you think it's mostly a graphing concern?
Most of the concerns I've been hearing are about the graphing which seems to be the main driver here.
Graphing is more of a minor thing for me/ESNZ-Ops... The main points for me are:
-
The use of
[^]/dummy tasks as a workaround (all through the definitions), we shouldn't need workarounds to represent what is a commonly desired behavior:In general, replaces task dependency with graph dependency to more accurately represent the designers intention (i.e. I want all of this to run before/after everything else).
-
And, as a consequence of
[^]/dummy, the behavior/risk of spawning in tasks (new addition or after sync start with start tasks off the ICP) to the distant past while rerunning deployment..
As of Cylc 8.5 (8.6 for family triggers), you can just trigger those R1 tasks to re-run them...
No, it's more making sure all other tasks don't run while rerunning ICP/deployment tasks.. (which the proposed will solve)
I don't think R1 tasks are an appreciable efficiency problem unless there are issues I'm not aware of?
Yes, I don't expect it to be appreciable.. But less tasks/edges the better
Obviously I agree with @dwsutherland that isolated graphs are desirable, and the basic justification for isolated startup graphs in particular is pretty clear.
If I have an R1 sub-graph that deploys files into the run directory, and I sometimes need to re-run it to update the live workflow, the requirement is, every task in every subsequent cycle must wait on the deployment graph (initially, and during subsequent re-runs).
That is a clear hard line between one part of the graph and another, with no need for cross-graph concurrency. Clearly, the easiest way to achieve that is an isolated graph. Run some stuff at startup, then run the main graph.
What we have to do now, purely to achieve this simple break between the startup and main graphs, is use dependencies that connect them to every single cycle in the main workflow graph (either via perpetual ICP dependence or the dummy-task-in-every-cycle workaround). That has several inherent problems:
- it is self-evidently much more complex and messy than the requirement suggests it needs to be, which puts a cognitive burden on users as well having some unpleasant practical implications
- perpetual ICP dependence easily renders graph visualizations unreadable (real examples given in the past)
- maybe we could provide a workaround for the graphing, but (a) let's avoid workarounds if possible, they often have side effects; and (b) really the viz should actually show the real dependencies as defined
- it raises the spectre of unwanted flow-on after re-run of the startup graph. This is less of a problem since group-trigger BUT it is still a problem (a) if you trigger a new flow; or (b) after a warm-start; or (c) if new tasks got added to the graph. And even if we can somehow block all of that users will still have to understand and think about it (is it safe to do this without flow-on - it looks like the graph might do that!).
Why not just make the whole mess go away, with an isolated start-up graph?
(Note this would not prevent use of explicit R1 dependencies in the main graph, as well, for other use cases).
(You did actually raise some good motivations yourself @oliver-sanders - back up in the history of this issue).
Yes we need to consider future directions when making changes like this, but let's be careful not to do nothing for a real existing long-standing issue, on grounds that we might do something more comprehensive in the distant future.
Cards on the table, ..., I'm not sure.
(You did actually raise some good motivations yourself @oliver-sanders - back up in the history of this issue).
Dammit, I knew I shouldn't have mentioned this. Really don't have time for a debate ATM.
I have been trying to flush out the use cases for this feature, discussion having mostly revolved around graphing before. I have also suggested ideas for implementation.
Cards on the table, I'm not sure about the feature at this stage (no need to re-state your arguments, I have read them), however I'm still being supportive here and will continue to be.
As of Cylc 8.5 (8.6 for family triggers), you can just trigger those R1 tasks to re-run them...
No, it's more making sure all other tasks don't run while rerunning ICP/deployment tasks.. (which the proposed will solve)
it raises the spectre of unwanted flow-on after re-run of the startup graph.
I'm confused by these comments.
Other tasks do NOT run if you re-trigger ICP/deployment tasks (tested!).
Irrespective of ICP triggering, we rely on this behaviour for other reasons and need it to be solid. Even with isolated ICP graphs, we still need this, e.g, if triggering some, but not all of the ICP tasks (e.g, rose-stem rebuild-type use cases).
The use of [^]/dummy tasks as a workaround (all through the definitions), we shouldn't need workarounds to represent what is a commonly desired behavior:
I see why you want this (and have read your explanations). However, I'm not quite convinced that this is the commonly desired behaviour.
Isolating the ICP, forces later cycles to wait unnecessarily, making workflows take longer which is highly undesirable in a lot of situations. It's the sort of thing where I could see researchers wanting the status quo, but opps wanting isolation.
Dummy tasks are only used for visualisation purposes. As I showed above, there is an alternative solution which would achieve your desired behaviour without the need for dummy tasks or graph modification of any kind, whilst avoiding the unnecessary barrier between cycles.
I'm confused by these comments. Other tasks do NOT run if you re-trigger ICP/deployment tasks (tested!).
That's why I carefully said "it raises the spectre of..." then (a) gave particular cases where it can still happen; and (b) noted that it is something that users will have to think about even if we mitigate ("will it flow on if I do this? if not, why not?") which isn't the case with an isolated startup graph.
Isolating the ICP, forces later cycles to wait unnecessarily, making workflows take longer which is highly undesirable in a lot of situations
Note an isolated startup graph gives us both - we'll still have the ICP in the main graph. I did note this above - not advocating literally removing the ICP from the main graph.
Most of the concerns I've been hearing are about the graphing which seems to be the main driver here.
The graphing is just the easy-to-visualize aspect of this, which is important in its own right (our vis is supposed to make workflow structure clear) - but the fundamental problem is, for a common type of start up task (e.g. deployment of workflow files by start up tasks, which is ubiquitous at ESNZ) a dependency between two graphs makes more sense, and would solve a bunch of problems, than having to achieve the same barrier entirely with inter-task dependence in the main graph.
I don't quite get the resistance to this. Even if we can provide enough workarounds and tweaks to 100% support this kind of use case via the main graph and individual task dependencies, it is self-evidently harder to understand and do it that way, for users, than simply having a separate start-up graph.
Cards on the table, I'm not sure about the feature at this stage (no need to re-state your arguments, I have read them), however I'm still being supportive here and will continue to be.
The problem is: I feel like if you see it from my/our perspective, it makes sense.. And your responses make it seem like we're talking past each other.
This is a graph1 => graph2 inter-graph dependency problem that we are using an ugly task dependency workaround to solve.. A workaround that has other undesirable consequences..
This workaround problem is common to every single workflow in the ESNZ operation.. They would all benefit from the proposed feature.
For example, take our ingestion workflow that polls all other workflows.. It is full of isolated graphs of many different cycling intervals (so can't restrict runahead, it is ingestion for many different upstream products), with roughly hundreds of these:
installed[^] => a
@xtrig_a => a => b => c
installed[^] => d
@xtrig_d => d => e => f
installed[^] => g
@xtrig_g => g => h => i
installed[^] => j
@xtrig_j => j => k => l
.
.
.
Every time we add, modify, rerun-installed, restart/reload (because what if the ICP was changed! or it's a sync start with start tasks that initially ignored the ICP but whose downstream tasks won't on reload/restart) a workflow, we are affected in one way or another..
We have to be continuously mindful of these workarounds.. Multiplied by 60 workflows...
Hopefully this gives you some insight on how even the simple Now implementation I proposed will free us from this hell, while having no breaking changes for existing workflows...
If you have any actual solutions that aren't filthy workarounds, "I'm all ears"..
Irrespective of ICP triggering, we rely on this behaviour for other reasons and need it to be solid. Even with isolated ICP graphs, we still need this, e.g, if triggering some, but not all of the ICP tasks (e.g, rose-stem rebuild-type use cases). avoiding the unnecessary barrier between cycles. I see why you want this (and have read your explanations). However, I'm not quite convinced that this is the commonly desired behaviour.
No one is proposing a barrier between cycles or isolating ICP (as Hilary mentioned), the solution/what-I-proposed-above will-have-to-have/has no restrictions in this regard. All graph sections can cycle as the designer intends, and only the inter-graph dependency will be the barrier WRT what graph can be-active/run-tasks at the time..
With the Now implementation we can use the same active pool, and perhaps different runahead pools..
I'm confused by these comments.
Other tasks do NOT run if you re-trigger ICP/deployment tasks (tested!).
If you have:
installed_dummy[-P1] => a => b => c
(or the [^] equivalent)
And you re-run installed at the ICP, what's stopping a, b, and c from kicking off while it's happening/running?
At present you need to pause the workflow or hold all tasks to do this at present..
The proposal solves this problem, by making sure only one graph section is active at once.
Anyway, I think I've probably exhausted pleading this case... It cannot stay as it is..
I'm sorry we don't have time to focus on this at the moment - getting 8.6 operational is taking all available time but hopefully not for much longer. Can we have a proposal please (covering both the long and short term aims)? That will make it much easier for me once we have time to return to this.