cylc-flow
cylc-flow copied to clipboard
remove: extend beyond the task pool
Implement the cylc remove proposal:
- Extend
cylc remove
to cover historical (n<0
) tasks. - Extend
cylc remove
to cover activen=0
tasks. - Extend
cylc remove
to cover future (n>0
) tasks. - Update CLI and GraphQL schema to match any changes.
- Update this note in cylc-doc: https://github.com/cylc/cylc-doc/blob/3c2b067e27a4c8e5187fc21a117882466fc0d4f6/src/user-guide/writing-workflows/suicide-triggers.rst?plain=1#L53-L60
# task_states table
name cycle flow_nums time_created ... status
---------- ---------- ---------- --------------- ----------
a 1 [1] <10 mins ago> ... failed
a 1 [1, 2] <2 mins ago> ... succeeded <---
How should we handle the case where the user removes task 1/a
from flow 2 (a.k.a. remove flow 2 from task 1/a
)?
Bearing in mind PRIMARY KEY(name, cycle, flow_nums)
, we have 2 choices:
- Delete row 1 and set
flow_nums
to[1]
in row 2 - Delete row 2
If a task is removed from the pool as completed, then subsequently re-triggered by the user, then the most recent entry will override the older one.
This is the same case by a different mechanism, though somewhat awkward. The job state info will be preserved (tied to the submit number).
So option 1
The behaviour for re-triggering might be closer to merge than replace (can't remember what else is in the table), otherwise yes.
Note that this will probably happen automatically after the new task has been added to the pool?
The full schema for task_states
looks like
name cycle flow_nums time_created time_updated submit_num status flow_wait is_manual_submit
------ ------ ---------- -------------- -------------- ------------ -------- ----------- ----------------
TEXT TEXT TEXT TEXT TEXT INTEGER TEXT INTEGER INTEGER
Note that this will probably happen automatically after the new task has been added to the pool?
Not sure what you mean; I am talking about removing a flow from a historical task
(1) is good
# task_states table
name cycle flow_nums time_created ... status
---------- ---------- ---------- --------------- ----------
a 1 [1] <10 mins ago> ... failed
a 1 [1, 2] <2 mins ago> ... succeeded
How should we handle cylc remove 1/a
(which is the same as cylc remove 1/a --flow all
if I have read the proposal correctly)?
Is it: delete all rows WHERE name = "a" AND cycle = "1"
, except the last such row, which we update so that flow_nums = "[]"
?
Yes, option (1) is good.
When outputs are removed, any corresponding prerequisites on downstream tasks in the pool shall be unset providing they were "naturally satisfied".
This means prerequisites which have been satisfied by
cylc trigger
orcylc set
will not be unset as the result ofcylc remove
as these have been manually satisfied by separate interventions.
Actually thinking about this, are we sure this is what we want?
If you accidentally cylc set/trigger
the wrong task surely you would want cylc remove
to undo any prereqs it satisfied?
There is also the snag that whether the prereqs were satisfied by cylc set/trigger
is not recorded permanently, as the task_prerequisites
table in the DB only keeps track of what is in the task pool at any given moment.
Actually thinking about this, are we sure this is what we want?
Yes.
the task_prerequisites table in the DB only keeps track of what is in the task pool at any given moment.
Does this get stored on the task_outputs table?
Actually thinking about this, are we sure this is what we want?
Yes.
Why? Could you elaborate in response to my question? 👇
If you accidentally
cylc set/trigger
the wrong task surely you would wantcylc remove
to undo any prereqs it satisfied?
the task_prerequisites table in the DB only keeps track of what is in the task pool at any given moment.
Does this get stored on the task_outputs table?
I've tested this, cylc set --output
are recorded in task_ouputs
as e.g.
cycle name flow_nums outputs
---------- ---------- ---------- --------------------------------------------------------------------------
2 foo [1] {"submitted": "(manually completed)", "started": "(manually completed)"}
However, there is no record of cylc set --prerequisite
or cylc trigger
Why? Could you elaborate in response to my question?
We have demoted a task and all of the outputs it generated from their flow. As a result these outputs no longer logically exist in the flow they once belonged to (because they have been removed) so should not affect the evolution of that flow any more. Any downstream prerequisites that this task satisfied should be wiped as a result.
If downstream tasks have already run, then it is too late to clean up via this method and the user will need to remove these downstream tasks too.
If you're unsure, lets take this offline.
Summary from offline discussion:
... prerequisites which have been satisfied by
cylc trigger
...
This is not relevant; cylc trigger
doesn't set prereqs (at least not anymore)
...or
cylc set
...
This only applies to cylc set --prerequisite
, not cylc set --output
This is not relevant; cylc trigger doesn't set prereqs (at least not anymore)
Warning - it's going to work by setting prerequisites soon - see the group trigger proposal. (Although I'm not sure that's relevant - I need to get back up to speed on this issue).