cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

skip mode

Open oliver-sanders opened this issue 2 years ago • 17 comments

Implement the skip mode proposal:

  • Implement a new run mode called "skip".
    • Document this alongside simulation mode.
    • Review the simulation mode documentation.
  • This should use the same code pathway as simulation mode.
    • E.G. change every if run_mode == 'simulation' to is run_mode in {'simulation', 'skip'}.
  • Add integration tests for simulation and skip run modes.
    • Note simulation mode is the default for integration tests.
  • Update the UIS analysis endpoint to filter out skipped tasks.
    • This requires a run-mode field in the DB.
    • The DB upgrader can assume all tasks ran in "live" mode if the field is not present.

Optimistically tagged against 8.3.0 but not a deal breaker if this is not ready in time.

oliver-sanders avatar Jul 25 '23 10:07 oliver-sanders

At present the run time for a simulated task can be defined by one of several configurations but is calculated at configure time NOT submit time. This means a broadcast presumably can't change runtime. This could really do with being moved to submit time as we add skip-mode functionalities on top which definitely need to work with broadcasts.

oliver-sanders avatar Aug 02 '23 16:08 oliver-sanders

Perhaps instead of broadcasting and changing a run mode, it could be a graph syntax, so you know if X happens, it will skip and it is more visible rather than potentially hidden inside a script?

ColemanTom avatar Aug 08 '23 05:08 ColemanTom

That sort of problem is better solved using graph branching.

Use cases for graph branching:

  • Skip one or more tasks.
  • Run special tasks (e.g. recovery tasks).
  • Implement "if" statements in the graph.
  • Stop the workflow (e.g. incremental workflows).

Skip mode is intended for different use cases which cannot be defined ahead of time in the graph e.g:

  • I want to skip one cycle of tasks because of a data problem.
  • I need to toggle on/off some tasks (e.g. for diagnostics) whilst the workflow is running.
  • I need to prevent Cylc from running one or more tasks for a selection of cycles, but these tasks have not been spawned yet so I can't run cylc set-outputs on them.

oliver-sanders avatar Aug 08 '23 09:08 oliver-sanders

Ok. We do have use cases of skipping while cycles due to resource issues. What happens if you adjust the mode of an actively running task? Would it kill that task and then it skips itself, or, would you need to kill it, skip it, then release it?

ColemanTom avatar Aug 08 '23 09:08 ColemanTom

Another question, will this impact results from report-timing or other reporting views of average runtimes of a task, or will that be treated separate to avoid giving misleading statistics?

ColemanTom avatar Aug 17 '23 00:08 ColemanTom

What happens if you adjust the mode of an actively running task?

Nothing, the active task would be left alone, however, retries would pick up the configuration change.

Another question, will this impact results from report-timing or other reporting views of average runtimes of a task

Good point. The "report-timings" utility will in due course be replaced by the GUI "analysis" view. So long as we record the run mode in the database, the analysis view will be able to filter skipped tasks out at the SQL level.

Will add a note to the OP

oliver-sanders avatar Aug 21 '23 10:08 oliver-sanders

I thought we'd documented this somewhere, but presumable skip mode will need the ability to set which (if any) optional outputs are satisfied by the skipped task? (And I feel that simulation mode might as well do that too?

wxtim avatar Aug 30 '23 09:08 wxtim

skip mode will need the ability to set which (if any) optional outputs are satisfied by the skipped task?

This is written up in the proposal document.

I feel that simulation mode might as well do that too?

Simulation mode currently only generates the :succeeded and :failed outputs, no plans to change that at present.

oliver-sanders avatar Aug 31 '23 12:08 oliver-sanders

If you mark a task to be skipped, but it has an xtrigger prerequisite, will it still need to wait fit the xtrigger to be successful before it runs? Or can you also skip the xtrigger?

I also wondered about the run length. Will it have a default length of 10 seconds like simulation, and how will that be controlled? I'm hoping the length will be as close to 0 as possible given its skipping instead of simulating.

With event handlers, can you specify different handlers when skipping than normally, or how will the information provided to the handler help differentiate skip versus normal runs?

Final question for today, will skipping add extra load to the server if there is a massive number of tasks being skipped (recognise the answer may change depending on if you have disabled event handling)?

ColemanTom avatar Oct 10 '23 03:10 ColemanTom

If you mark a task to be skipped, but it has an xtrigger prerequisite, will it still need to wait fit the xtrigger to be successful before it runs? Or can you also skip the xtrigger?

I would have thought that it makes sense to satisfy any xtriggers when you set a task mode to skipped. Thank you for bringing that to my attention for discussion with @oliver-sanders .

I also wondered about the run length. Will it have a default length of 10 seconds like simulation, and how will that be controlled? I'm hoping the length will be as close to 0 as possible given its skipping instead of simulating.

Zero. Definitely zero, and we're not planning on making it customizable. Use simulation mode for that! (n.b. You cannot currently broadcast to simulation mode, but I have a PR for that)

With event handlers, can you specify different handlers when skipping than normally, or how will the information provided to the handler help differentiate skip versus normal runs?

The proposal has the idea that event handlers will be turned off in skip mode, unless you set [runtime][task][simulation]disable task event handlers = False. There is no provision for alternative event handlers. Can you explain why you want them and what you would have them do?

Final question for today, will skipping add extra load to the server if there is a massive number of tasks being skipped (recognise the answer may change depending on if you have disabled event handling)?

Yes, potentially, but the extra code run by the main loop for simulation (and skip will be based on this) is pretty lightweight - I haven't got evidence, but I think you'd need to be skipping colossal numbers of tasks for this to be noticeable. I will have a quick check though. This might be a good reason not to allow alternative event handlers.

wxtim avatar Oct 10 '23 08:10 wxtim

I would have thought that it makes sense to satisfy any xtriggers when you set a task mode to skipped. Thank you for bringing that to my attention for discussion with @oliver-sanders .

I would agree with you as I wouldn't think people want to wait for xtriggers which may be a ways off. I can just see an edge case where one xtrigger is used for two tasks, and only one is skipped.

Zero. Definitely zero, and we're not planning on making it customizable.

Perfect.

With event handlers, can you specify different handlers when skipping than normally, or how will the information provided to the handler help differentiate skip versus normal runs?

The proposal has the idea that event handlers will be turned off in skip mode, unless you set [runtime][task][simulation]disable task event handlers = False. There is no provision for alternative event handlers. Can you explain why you want them and what you would have them do?

I have not thought much about this, just thinking aloud. If you hypothetically have a handler that runs on any status change, and sends that information to a global reporting database or to external systems outside of your security area so they can do what they need to do, an event may be useful for them to still receive a message from a broker stating it was skipped or something so they can do appropriate actions on their end.

ColemanTom avatar Oct 10 '23 08:10 ColemanTom

I can just see an edge case where one xtrigger is used for two tasks, and only one is skipped.

I think that the skipping of the xtrigger will be done by the task so this shouldn't happen.

wxtim avatar Oct 10 '23 08:10 wxtim

Yep:

  • Xtriggers aren't a feature of skip mode (i.e. they shouldn't get run).
    • Note, legacy clock-triggers are now translated into xtriggers at configure time.
    • The use cases considered for skip mode are about cutting bits out of a workflow manually.
      • E.G. cutting a cycle out manually in a catch-up scenario.
      • Note, optional outputs are the preferred solution to graph branching which can by used to achieve similar results in an automated way.
    • It's there as an easier alternative to cylc remove <old-cycle>/*; cylc trigger <new-cycle>/<start-tasks>
  • There should not be any sleep or any way to configure sleep.
  • By default event handlers will be turned off.
  • We haven't considered providing the run mode to event handlers, but could add a template variable along the lines of event handlers = myhandler %(workflow)s %(cycle)s %(task)s %(run_mode)s

oliver-sanders avatar Oct 10 '23 10:10 oliver-sanders

Sorry, spoke too soon, xtriggers will be run in skip mode.

Skip mode applies only to a task's execution, it's essentially a dummy runtime environment which does nothing but yield the expected (or configured) outputs. It should be possible to configure a cycle to skip ahead of time in order to allow the workflow to continue after it, without this causing anything to run through the disabling of clock triggers.

oliver-sanders avatar Oct 12 '23 10:10 oliver-sanders

Sorry, spoke too soon, xtriggers will be run in skip mode.

Skip mode applies only to a task's execution, it's essentially a dummy runtime environment which does nothing but yield the expected (or configured) outputs. It should be possible to configure a cycle to skip ahead of time in order to allow the workflow to continue after it, without this causing anything to run through the disabling of clock triggers.

This could lead to stalling, unless xtrigger can be skipped. If a work flow has an xtrigger which will never be satisfied say data availability checking, so you skip the cycle, it won't actually skip? I know you could do skip, then set outputs on all tasks with xtriggers, and then remove those tasks from the graph, but that is much more verbose a set of instructions to perform an actual skip than should be required in my opinion.

ColemanTom avatar Oct 12 '23 23:10 ColemanTom

Note that this "skip mode" is intended only for operators to respond to real-world failures that the workflow cannot respond to automatically.

I'm not sure what use cases you have in mind.

oliver-sanders avatar Oct 13 '23 08:10 oliver-sanders

Note that this "skip mode" is intended only for operators to respond to real-world failures that the workflow cannot respond to automatically.

I'm not sure what use cases you have in mind.

Yes, this is what we are doing in cylc7.

  • Model
  • Postprocessing
  • Ensemble statistics

Due to reason say major hpc outage which means we have limited resources, the model needs to be skipped to allow more critical models space. So we trigger a script to skip all the individual work flows relating to the model.

Postprocessing uses a data availability xtrigger to check if it can start running, checking the Model. Similar for ensemble statistics it checks for certain data existing from Postprocessing before it runs. We do not use suite state xtrigger because that isn't useful when the db are on different servers and we want to be able to run non production Postprocessing fit development without also running the Model, or just checks for production data existing for example.

Your current plans would work (I think) for suite state xtriggers, but there are reasons that may not be usable.

ColemanTom avatar Oct 16 '23 01:10 ColemanTom