cylc-flow
cylc-flow copied to clipboard
mixed parentless/non-parentless task cause premature shutdown
Description
I ran into this problem while thinking of ways to break my inbound parentless sequential wall clock task spawning (which I may work the solution into). A workflow with a mixed parentless/not-parentless task on different cycles causes/may-cause premature workflow shutdown.
Reproducible Example
i.e. this workflow shutdowns after 2015 has run
[scheduler]
cycle point format = CCYY
[scheduling]
initial cycle point = 2010
[[xtriggers]]
clock_1 = wall_clock()
[[graph]]
P2Y = """
@clock_1 => a
a => b
"""
+P1Y/P2Y = """
a => b
b[-P1Y] => a
"""
[runtime]
[[root]]
script = sleep 5
[[a,b]]
(clock trigger not really needed, stalls without)
Expected Behaviour
Workflow should never stop.
Discussion
I think the trouble occurs when a non-parentless task ends up the final or next at the runahead limit.
A possible solution is to check for non-spawned parentless successor (the next occurrence) of a task that enters the active pool (from initial spawn or RH pool).. Perhaps we can narrow down the checking somehow (i.e. those who could possibly be parentless, via config or w/e)..
I think the clock trigger is unnecessary, to reproduce this @dwsutherland ?
And for me, it shuts down prematurely after the 2015 point, but I haven't seen it stall. (Mind you, premature shut down is even worse!)
For reference. Fortunately this sort of alternating parented/parentless structure is probably unlikely in real workflows.
I think the clock trigger is unnecessary, to reproduce this @dwsutherland ?
Yes, mentioned that
Sorry, so you did! 🤦
And for me, it shuts down prematurely after the 2015 point, but I haven't seen it stall. (Mind you, premature shut down is even worse!)
Yes, sorry, fixed description
My minimal example:
[scheduling]
cycling mode = integer
[[graph]]
P2 = "a => b"
2/P2 = """
a => b
b[-P1] => a
"""
[runtime]
[[a,b]]
Actually, even this:
[scheduling]
cycling mode = integer
[[graph]]
P2 = "a"
2/P2 = "a[-P1] => a"
[runtime]
[[a]]
These both shut down after point 6.
I think the trouble occurs when a non-parentless task ends up the final or next at the runahead limit.
I think you might be right there...
Integration test - no progress on fix
async def test_foo(flow, scheduler, run, complete):
"""Check that the runahead limit does not prevent the spawning of tasks.
"""
wid = flow({
"scheduling": {
"cycling mode": "integer",
"graph": {
"P2": "a",
"2/P2": "a[-P1] => a"
}
},
})
schd = scheduler(wid, paused_start=False)
async with run(schd) as LOG:
await complete(schd, "6/a")
breakpoint()
pass
assert schd.pool.get_tasks() != []