cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

Skip Mode

Open wxtim opened this issue 1 year ago • 19 comments

Closes https://github.com/cylc/cylc-flow/issues/5641 (also fixes #5975, and fixes #5820 )

Skip Mode Proposal Doc

This branch includes (marked against skip mode proposal):

  1. A new task run mode called "skip" to sit alongside the existing "live" and "simulation" and "dummy" modes.
  2. Skip mode settings in `[runtime][<namespace>][skip].
  3. Task-level control of the run mode - allowing run mode of tasks to override that of the workflow using [runtime][<namespace>]run mode. a. Broadcast can change run mode for future task job submissions. b. Cylc Validate and lint will warn about the setting not being live.
  4. cylc set --out skip sets outputs from skip mode.
  5. Tests to ensure that run mode = skip respects is_held flag.
  6. Tests to ensure that force triggering a task will not override the run mode.

Extras 7. Run Mode is available as an task attribute in the UI 8. When tasks are run in skip mode, the prerequisites which correspond to the outputs they generate should be marked as satisfied by skip mode rather than satisfied naturally for provenance reasons. For the purpose of cylc remove logic, satisfied by skip mode should be treated the same as satisfied naturally.

There are two extensions, which I haven't dealt with yet, because I want to ensure that the basic functionality works, and move to the substantial documentation PR which need follow this.

Check List

  • [x] I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • [x] Contains logically grouped changes (else tidy your branch by rebase).
  • [x] Does not contain off-topic changes (use other PRs for other changes).
  • [x] Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • [x] Tests are included (or explain why tests are not needed).
  • [x] CHANGES.md entry included if this is a change that can affect users
  • [x] Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • [x] Feature: PR against master

wxtim avatar Mar 27 '24 12:03 wxtim

Got a change to the schema representation for run mode here: https://github.com/wxtim/cylc/pull/61

MetRonnie avatar Jun 27 '24 14:06 MetRonnie

Add skip mode to test for cylc validate --mode=sim similar to the tests in https://github.com/cylc/cylc-flow/pull/6213

wxtim avatar Jul 15 '24 07:07 wxtim

@wxtim, what's the status here?

What I've done is ready to review, but....

  • [x] Need to review https://github.com/wxtim/cylc/pull/61 - I need to review
  • [x] Add test in https://github.com/cylc/cylc-flow/pull/6213
  • [x] check https://github.com/cylc/cylc-flow/issues/6189

oliver-sanders avatar Jul 15 '24 12:07 oliver-sanders

  1. Not Done

@wxtim, according to the OP, this does not close #5641. If that's out of date, just amend the OP, if you're planning to do it in a follow-on, remove the linked issue and stick a comment on it so we know what the status is.

oliver-sanders avatar Aug 01 '24 13:08 oliver-sanders

  1. Not Done

@wxtim, according to the OP, this does not close #5641. If that's out of date, just amend the OP, if you're planning to do it in a follow-on, remove the linked issue and stick a comment on it so we know what the status is.

I've done it, but in the process of checking that I've done it I spotted an undesirable failure to document behaviour which led me down a rabbit hole leading to fix https://github.com/cylc/cylc-flow/pull/6039/commits/380abccc3caf6cf7019b5912d0533bf2a5b62fa1.

wxtim avatar Aug 02 '24 10:08 wxtim

Also try_num is 0 instead of 1 to begin with

I quick check against master suggests that's not true of the db

wxtim avatar Aug 05 '24 08:08 wxtim

(FYI bullet point [8] is blank in the PR description)

oliver-sanders avatar Aug 07 '24 12:08 oliver-sanders

Have started functional testing, looking good so far, a couple of small issues:

1) Cannot broadcast [skip] settings

E.G. This command should configure a task to run in skip mode AND fail when it does so:

$ cylc broadcast -s 'run mode = skip' -s '[skip]outputs = failed'

However, the [skip] part seems to be ignored. Not a biggie from a functionality perspective (why would anyone want to do this!), but if we're allowing users to do this it should work.

2) The started output isn't being produced?

This example stalls for me:

a? & a:started => b

[runtime]
  [[root]]
    run mode  = skip

3) Skip mode messes with the average task runtime

Skip mode tasks are not excluded from the average task runtime calculation which messes it up a fair bit.

Not a biggie, can be moved to a follow-on issue.

4) Skip mode messes with the analysis view

Skip mode tasks are not excluded from the analysis view statistics which is a tad confusing.

Not a biggie, can be punted into a small issue for the data workflows team. Just make sure that it is easy to filter out skip mode tasks in the task_jobs DB table (i.e. by filtering by platform name).

Opened: https://github.com/cylc/cylc-uiserver/issues/626

oliver-sanders avatar Aug 30 '24 12:08 oliver-sanders

  1. Cannot broadcast [skip] settings

Fixed by https://github.com/cylc/cylc-flow/pull/6039/commits/f2fb60e005007efb47f6461d18ff3d88e3aa1e47. It turned out to be a bit of a rabbit hole with disable task event handlers. I can see wanting to change this as being a reason to broadcast skip mode settings.

  1. The started output isn't being produced?

No, but I don't think it's my change at fault - Since it appears to exist as an entirely separate bug present in Cylc as far back as 8.0.0! I'll attempt to fix anyway!

Proposed fix up at https://github.com/cylc/cylc-flow/pull/6351 - since it's a bug. Will rebase and fix after.

  1. Skip mode messes with the average task runtime

Fixed https://github.com/cylc/cylc-flow/pull/6039/commits/660ca3a7c56c652e38c6047adcd564925dca6e54

I have noticed that dummy.py looks like

wxtim avatar Sep 02 '24 12:09 wxtim

Inspecting a workflow with "skip mode" tasks using Cylc Review revealed a couple of minor things:

Screenshot from 2024-09-24 15-51-17

  1. The "job #" box says "1 of 0" whereas it would say "1 of 1" for tasks run in "live" or "simulation" mode.
    • This maybe suggests that something isn't being incremented or written to the DB correctly.
  2. The "job batch" box says "simulation" for "skip" mode tasks.
    • I'm not sure why that is, it might be a hardcoded value in Cylc Review (if so no worries).

oliver-sanders avatar Sep 24 '24 14:09 oliver-sanders

  • The "job #" box says "1 of 0" whereas it would say "1 of 1" for tasks run in "live" or "simulation" mode.

    • This maybe suggests that something isn't being incremented or written to the DB correctly.

This appears to be the case - neither of the "nonlive" modes were updating the task_states table. Added, along with a wee test.

  • The "job batch" box says "simulation" for "skip" mode tasks.

This is an easy fix, and I'll push it up.

wxtim avatar Sep 25 '24 10:09 wxtim

Having a play around with this branch with some imagined use cases.

Unfortunately, I've encountered one critical traceback:

Traceback (most recent call last):                                                                                                 
  File "~/cylc/flow/scheduler.py", line 652, in run_scheduler
    await self._main_loop()
  File "~/cylc/flow/scheduler.py", line 1520, in _main_loop
    self.release_queued_tasks()
  File "~/cylc/flow/scheduler.py", line 1267, in release_queued_tasks
    for itask in self.task_job_mgr.submit_task_jobs(
  File "~/cylc/flow/task_job_mgr.py", line 266, in submit_task_jobs
    itasks, submitted_nonlive_tasks = self.submit_nonlive_task_jobs(
  File "~/cylc/flow/task_job_mgr.py", line 1080, in submit_nonlive_task_jobs
    nonlive_mode = submit_func(
  File "~/cylc/flow/run_modes/skip.py", line 50, in submit_task_job
    task_job_mgr._set_retry_timers(itask, rtconfig)
  File "~/cylc/flow/task_job_mgr.py", line 1011, in _set_retry_timers
    or itask.platform['submission retry delays']
KeyError: 'submission retry delays'

I had configured the tasks to run in skip-mode and to generate the failed output using a broadcast. I then made another broadcast to change the skip-mode outputs and re-triggered the tasks.

oliver-sanders avatar Oct 01 '24 12:10 oliver-sanders

I had configured the tasks to run in skip-mode and to generate the failed output using a broadcast. I then made another broadcast to change the skip-mode outputs and re-triggered the tasks.

I've replicated this :(

wxtim avatar Oct 02 '24 15:10 wxtim

Poking @MetRonnie because Oliver Hath Approved!

wxtim avatar Oct 09 '24 08:10 wxtim

(Will get round to it after the Cylc/Rose training)

MetRonnie avatar Oct 09 '24 09:10 MetRonnie

Latest commit seems to include some kind of squashed merge of master into this branch? image

MetRonnie avatar Oct 16 '24 15:10 MetRonnie

Latest commit seems to include some kind of squashed merge of master into this branch?

Yes - I made the history horrible, and I think this is the best I can do to clean it up.

wxtim avatar Oct 16 '24 15:10 wxtim

@MetRonnie & @oliver-sanders have a look at https://github.com/cylc/cylc-flow/pull/6039/commits/bc8d067b24f08770aaaf94eb0c420cd0c9e88fa5. I'm not convinced it's a good idea, but if you think it might be I'll add tests. But IMO mission creep.

wxtim avatar Oct 16 '24 16:10 wxtim

https://github.com/MetRonnie/cylc-flow/tree/skip-mode-git-rescue is identical to 047c76fdb1adbb3ae7d6f65987299eecf7bfd9a3 but with a proper history (took your previous head commit and dropped the (empty) merge commit named "merge" which was possibly a case of merging a local branch into an identical but rebased branch).

MetRonnie avatar Oct 16 '24 16:10 MetRonnie

To avoid this getting lost above: https://github.com/cylc/cylc-flow/pull/6039#discussion_r1822975115

MetRonnie avatar Oct 30 '24 16:10 MetRonnie

Getting a warning when setting the skip output on a task (that does not have any custom outputs defined)

INFO - Command "set" received. ID=c0fa6e99-387b-409f-9755-e14596cdee40
    set(flow=[], flow_wait=False, outputs=['skip'], prerequisites=[], tasks=['1/foo'])
WARNING - output 1/foo:skip not found
INFO - [1/foo:waiting(queued)] => succeeded(queued)
INFO - [1/foo/00:succeeded(queued)] => succeeded

MetRonnie avatar Nov 19 '24 17:11 MetRonnie

Getting a warning when setting the skip output on a task (that does not have any custom outputs defined)

Fixed in https://github.com/cylc/cylc-flow/pull/6039/commits/8136da67a96538e79653852ceb3c8c1eff7c5638

wxtim avatar Nov 20 '24 09:11 wxtim

I think this is pretty much good to go 🎉 Just got a conflict to resolve + my suggestion above

MetRonnie avatar Nov 22 '24 17:11 MetRonnie

I've got a deconfliction here: https://github.com/wxtim/cylc/pull/70

I've checked it carefully, should be correct.

MetRonnie avatar Nov 28 '24 18:11 MetRonnie

:tada:

Congrats Tim!

oliver-sanders avatar Dec 03 '24 17:12 oliver-sanders