cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

remove: extend beyond the task pool

Open oliver-sanders opened this issue 1 year ago • 14 comments

Implement the cylc remove proposal:

  • Extend cylc remove to cover historical (n<0) tasks.
  • Extend cylc remove to cover active n=0 tasks.
  • Extend cylc remove to cover future (n>0) tasks.
  • Update CLI and GraphQL schema to match any changes.
  • Update this note in cylc-doc: https://github.com/cylc/cylc-doc/blob/3c2b067e27a4c8e5187fc21a117882466fc0d4f6/src/user-guide/writing-workflows/suicide-triggers.rst?plain=1#L53-L60

oliver-sanders avatar Jul 25 '23 10:07 oliver-sanders

# task_states table
name        cycle       flow_nums   time_created    ... status  
----------  ----------  ----------  ---------------     ----------
a           1           [1]         <10 mins ago>   ... failed
a           1           [1, 2]      <2 mins ago>    ... succeeded   <---

How should we handle the case where the user removes task 1/a from flow 2 (a.k.a. remove flow 2 from task 1/a)?

Bearing in mind PRIMARY KEY(name, cycle, flow_nums), we have 2 choices:

  1. Delete row 1 and set flow_nums to [1] in row 2
  2. Delete row 2

MetRonnie avatar Jun 06 '24 10:06 MetRonnie

If a task is removed from the pool as completed, then subsequently re-triggered by the user, then the most recent entry will override the older one.

This is the same case by a different mechanism, though somewhat awkward. The job state info will be preserved (tied to the submit number).

oliver-sanders avatar Jun 06 '24 11:06 oliver-sanders

So option 1

MetRonnie avatar Jun 06 '24 11:06 MetRonnie

The behaviour for re-triggering might be closer to merge than replace (can't remember what else is in the table), otherwise yes.

Note that this will probably happen automatically after the new task has been added to the pool?

oliver-sanders avatar Jun 06 '24 12:06 oliver-sanders

The full schema for task_states looks like

name   cycle  flow_nums  time_created   time_updated   submit_num   status   flow_wait   is_manual_submit
------ ------ ---------- -------------- -------------- ------------ -------- ----------- ----------------
TEXT   TEXT   TEXT       TEXT           TEXT           INTEGER      TEXT     INTEGER     INTEGER

Note that this will probably happen automatically after the new task has been added to the pool?

Not sure what you mean; I am talking about removing a flow from a historical task

MetRonnie avatar Jun 06 '24 13:06 MetRonnie

(1) is good

oliver-sanders avatar Jun 06 '24 14:06 oliver-sanders

# task_states table
name        cycle       flow_nums   time_created    ... status  
----------  ----------  ----------  ---------------     ----------
a           1           [1]         <10 mins ago>   ... failed
a           1           [1, 2]      <2 mins ago>    ... succeeded

How should we handle cylc remove 1/a (which is the same as cylc remove 1/a --flow all if I have read the proposal correctly)?

Is it: delete all rows WHERE name = "a" AND cycle = "1", except the last such row, which we update so that flow_nums = "[]"?

MetRonnie avatar Jun 28 '24 17:06 MetRonnie

Yes, option (1) is good.

oliver-sanders avatar Jul 01 '24 09:07 oliver-sanders

  1. When outputs are removed, any corresponding prerequisites on downstream tasks in the pool shall be unset providing they were "naturally satisfied".

    This means prerequisites which have been satisfied by cylc trigger or cylc set will not be unset as the result of cylc remove as these have been manually satisfied by separate interventions.

Actually thinking about this, are we sure this is what we want?

If you accidentally cylc set/trigger the wrong task surely you would want cylc remove to undo any prereqs it satisfied?

There is also the snag that whether the prereqs were satisfied by cylc set/trigger is not recorded permanently, as the task_prerequisites table in the DB only keeps track of what is in the task pool at any given moment.

MetRonnie avatar Aug 14 '24 15:08 MetRonnie

Actually thinking about this, are we sure this is what we want?

Yes.

the task_prerequisites table in the DB only keeps track of what is in the task pool at any given moment.

Does this get stored on the task_outputs table?

oliver-sanders avatar Aug 14 '24 15:08 oliver-sanders

Actually thinking about this, are we sure this is what we want?

Yes.

Why? Could you elaborate in response to my question? 👇

If you accidentally cylc set/trigger the wrong task surely you would want cylc remove to undo any prereqs it satisfied?


the task_prerequisites table in the DB only keeps track of what is in the task pool at any given moment.

Does this get stored on the task_outputs table?

I've tested this, cylc set --output are recorded in task_ouputs as e.g.

cycle       name        flow_nums   outputs                                                                   
----------  ----------  ----------  --------------------------------------------------------------------------
2           foo         [1]         {"submitted": "(manually completed)", "started": "(manually completed)"}

However, there is no record of cylc set --prerequisite or cylc trigger

MetRonnie avatar Aug 14 '24 16:08 MetRonnie

Why? Could you elaborate in response to my question?

We have demoted a task and all of the outputs it generated from their flow. As a result these outputs no longer logically exist in the flow they once belonged to (because they have been removed) so should not affect the evolution of that flow any more. Any downstream prerequisites that this task satisfied should be wiped as a result.

If downstream tasks have already run, then it is too late to clean up via this method and the user will need to remove these downstream tasks too.

If you're unsure, lets take this offline.

oliver-sanders avatar Aug 15 '24 09:08 oliver-sanders

Summary from offline discussion:

... prerequisites which have been satisfied by cylc trigger...

This is not relevant; cylc trigger doesn't set prereqs (at least not anymore)

...or cylc set...

This only applies to cylc set --prerequisite, not cylc set --output

MetRonnie avatar Aug 21 '24 10:08 MetRonnie

This is not relevant; cylc trigger doesn't set prereqs (at least not anymore)

Warning - it's going to work by setting prerequisites soon - see the group trigger proposal. (Although I'm not sure that's relevant - I need to get back up to speed on this issue).

hjoliver avatar Aug 22 '24 22:08 hjoliver