cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

Make `cylc remove` flow-aware and extend to historical tasks

Open MetRonnie opened this issue 1 year ago • 5 comments

Partially addresses #5643

Summary

This mostly implements the "Cylc Remove Extension" proposal.

Flow numbers

cylc remove now has a --flow option for removing a task from specific flows.

If not used, it will remove the task from all flows that it belongs to.

If the removed task is active/waiting, if it is removed from a subset of flows that it belongs to, it will remain in the task pool; if it is removed from all flows that it belongs to, it will be removed from the task pool (as is the current behaviour).

If a task is removed from all flows that it belongs to, it will become a no-flow task (flow=None).

For ease of reviewing, you can use my UI branch that displays flow numbers: https://github.com/MetRonnie/cylc-ui/tree/flow-nums [^c].

[^c]: Waiting tasks that are not yet in the pool have greyed out flow numbers at the moment.

Historical tasks

cylc remove now can remove tasks that are no longer active, making it look like they never ran. It does this by removing the task from the specified flows in the database (in the task_states and task_outputs tables)[^a], and un-setting any prerequisites of active tasks that the removed task had naturally satisfied[^b]. If a task is removed from all flows that it belongs to, a no-flow task is left in the DB for provenance.

The above also applies to active/waiting tasks that cylc remove is used on.

[^a]: If removing flows would result in two rows in the DB no longer being unique, the SQLite UPDATE OR REPLACE statement is used, so the first entry will be removed and the most recent entry will remain. [^b]: Prerequisites manually satisfied by cylc set --pre are not affected by cylc remove.

What's left to do

  • When removing an active task from all its flows, kill the task.

  • Should probably add a functional test with the --flow option.

  • Need to check this one:

    If removing a task causes all of the prerequisites on a downstream task to be unset (i.e. if the downstream task was spawned as a result of outputs from this task alone) then the downstream task shall be removed from the pool.

Check List

  • [x] I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • [x] Contains logically grouped changes (else tidy your branch by rebase).
  • [x] Does not contain off-topic changes (use other PRs for other changes).
  • [x] No dependency changes
  • [x] Tests are included
  • [x] Changelog entry included if this is a change that can affect users
  • [ ] Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • [x] If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

MetRonnie avatar Sep 11 '24 15:09 MetRonnie

Code looks good, trying out some manual testing.

I'm not having much luck removing n=-1 tasks (i.e. tasks which have just succeeded). Here's an example:

[scheduling]
    cycling mode = integer
    initial cycle point = 1
    [[graph]]
        P1 = a[-P1] => a
[runtime]
    [[root]]
    [[a]]
        script = sleep 1
        completion = (submitted and succeeded)
# start the workflow
cylc vip --pause

# run task 1/a
cylc play
cylc pause

# remove task 1/a
cylc remove 1/a

Results:

  • The DB is updated correctly.
  • However the prereq 1/a -> 2/a is marked as satisfied in the cylc show output.
  • When I resume the workflow 2/a runs (note it was in the queue when 1/a was removed).

oliver-sanders avatar Sep 30 '24 15:09 oliver-sanders

(Test failure is just flaky tui test, not bothering re-running a 3rd time 🤮)

MetRonnie avatar Oct 10 '24 16:10 MetRonnie

Have been trying this out for sub-graph re-run use cases. The remove functionality is all working correctly :+1:, I am able to re-run sub-graphs cleanly without using new flows :rocket:.

I have encountered some hitches (not related to this PR, all for consideration in follow-on work):

  1. I've been hitting this issue a lot (https://github.com/cylc/cylc-flow/issues/6143) with any dependencies which span more than one "rank" in the graph. In these situations, the GUI and DB say one thing, but the task pool another which is highly confusing. Restarting (or possibly reloading?) the workflow should fix it, but we need a solution to this before remove really becomes a viable solution to re-running sub-graphs. Note this is a very similar limitation to the need to satisfy off-flow prereqs when re-running via new flows.
  2. Removing a task doesn't necessarily change anything in the GUI (i.e. there is no visual feedback). This was anticipated when the proposal was written and is why https://github.com/cylc/cylc-ui/issues/470 is on the Cylc roadmap. I wonder whether a global change to the appearance of n=0 tasks (e.g. making them translucent as originally proposed) would provide an easier workaround and also help with other things?
  3. Removed tasks retain their state in the GUI. E.G. if a task succeeded and was removed, it continues to be shown with a filled circle. It makes sense to preserve the job state, however, I wonder if we should reset the task state back to waiting when we remove the task to make it clearer?

oliver-sanders avatar Oct 11 '24 15:10 oliver-sanders

What's the status on these TODO items from the OP:

What's left to do

When removing an active task from all its flows, kill the task.

Should probably add a functional test with the --flow option.

Need to check this one:

oliver-sanders avatar Oct 17 '24 12:10 oliver-sanders

Will either come in a follow-up PR or this one depending on how soon @hjoliver reviews this

MetRonnie avatar Oct 17 '24 13:10 MetRonnie

Reading:

This means prerequisites which have been satisfied by cylc trigger or cylc set will not be unset as the result of cylc remove as these have been manually satisfied by separate interventions.

Proposal: Clause 2

I tried

[scheduler]
    allow implicit tasks = True

[scheduling]
    [[graph]]
        R1 = foo => bar => baz => qux
$ cylc vip . --pause -n remove/one --pause
$ cylc trigger remove/one//1/qux/

Pre-requisite table in DB

cycle name flow_nums prereq_name prereq_cycle prereq_output satisfied
1 qux [1] baz 1 succeeded 0
$ cylc remove remove/one//1/qux

Then removes the DB entry. My reading of the spec is that this shouldn't be happening.

wxtim avatar Oct 25 '24 09:10 wxtim

This means prerequisites which have been satisfied by cylc trigger or cylc set will not be unset as the result of cylc remove as these have been manually satisfied by separate interventions.

This statement is in relation to prerequisites satisfied by the task being removed, not the prerequisites of the removed task itself.

oliver-sanders avatar Oct 25 '24 09:10 oliver-sanders

If removing a task causes all of the prerequisites on a downstream task to be unset (i.e. if the downstream task was spawned as a result of outputs from this task alone) then the downstream task shall be removed from the pool.

This is proving to be tricky, as there is a difference between how prerequisites are handled for these 2 cases:

[scheduling]
    [[graph]]
       R1 = """
           (a1 | a2) & b => c
       """

and

[scheduling]
    [[graph]]
       R1 = """
           (a1 | a2) => c
           b => c
       """

In the first case, there is only 1 prerequisite that includes both a1 | a2 and b, whereas in the second there are 2 prerequisites - 1 for a1 | a2 and 1 for b. In the first case, removing b would result in c being removed from the task pool even if a1 or a2 had succeeded, and I am not sure there is a way to tell this without another refactor of the Prerequisite class.

Edit: I was wrong. In any case, we should remove the child task from the pool only if it has no satisfied prerequisite tasks, regardless of whether they are in any AND/OR conditions

MetRonnie avatar Oct 29 '24 13:10 MetRonnie

Where are we up to with these TODO items?

What's left to do

When removing an active task from all its flows, kill the task.

Should probably add a functional test with the --flow option.

Need to check this one:

   If removing a task causes all of the prerequisites on a downstream task to be unset (i.e. if the downstream task was spawned as a result of outputs from this task alone) then the downstream task shall be removed from the pool.

oliver-sanders avatar Oct 31 '24 15:10 oliver-sanders

Where are we up to with these TODO items?

(Edited) I've updated the OP

MetRonnie avatar Oct 31 '24 15:10 MetRonnie

Getting in review mode here - sorry for the delay!

hjoliver avatar Nov 10 '24 21:11 hjoliver

@MetRonnie - is this still in draft because of the two "what's left to do" items at the top?

hjoliver avatar Nov 11 '24 06:11 hjoliver

Yes, because Oliver wanted to wait for the whole proposal to be addressed before merging. (FYI I'm going to close this PR and open a new one to stop GitHub collapsing recent review comments)

MetRonnie avatar Nov 11 '24 10:11 MetRonnie

Superseded by #6472

MetRonnie avatar Nov 11 '24 11:11 MetRonnie