cylc-flow
cylc-flow copied to clipboard
skip mode
Implement the skip mode proposal:
- Implement a new run mode called "skip".
- Document this alongside simulation mode.
- Review the simulation mode documentation.
- This should use the same code pathway as simulation mode.
- E.G. change every
if run_mode == 'simulation'tois run_mode in {'simulation', 'skip'}.
- E.G. change every
- Add integration tests for simulation and skip run modes.
- Note simulation mode is the default for integration tests.
- Update the UIS analysis endpoint to filter out skipped tasks.
- This requires a run-mode field in the DB.
- The DB upgrader can assume all tasks ran in "live" mode if the field is not present.
Optimistically tagged against 8.3.0 but not a deal breaker if this is not ready in time.
At present the run time for a simulated task can be defined by one of several configurations but is calculated at configure time NOT submit time. This means a broadcast presumably can't change runtime. This could really do with being moved to submit time as we add skip-mode functionalities on top which definitely need to work with broadcasts.
Perhaps instead of broadcasting and changing a run mode, it could be a graph syntax, so you know if X happens, it will skip and it is more visible rather than potentially hidden inside a script?
That sort of problem is better solved using graph branching.
Use cases for graph branching:
- Skip one or more tasks.
- Run special tasks (e.g. recovery tasks).
- Implement "if" statements in the graph.
- Stop the workflow (e.g. incremental workflows).
Skip mode is intended for different use cases which cannot be defined ahead of time in the graph e.g:
- I want to skip one cycle of tasks because of a data problem.
- I need to toggle on/off some tasks (e.g. for diagnostics) whilst the workflow is running.
- I need to prevent Cylc from running one or more tasks for a selection of cycles, but these tasks have not been spawned yet so I can't run
cylc set-outputson them.
Ok. We do have use cases of skipping while cycles due to resource issues. What happens if you adjust the mode of an actively running task? Would it kill that task and then it skips itself, or, would you need to kill it, skip it, then release it?
Another question, will this impact results from report-timing or other reporting views of average runtimes of a task, or will that be treated separate to avoid giving misleading statistics?
What happens if you adjust the mode of an actively running task?
Nothing, the active task would be left alone, however, retries would pick up the configuration change.
Another question, will this impact results from report-timing or other reporting views of average runtimes of a task
Good point. The "report-timings" utility will in due course be replaced by the GUI "analysis" view. So long as we record the run mode in the database, the analysis view will be able to filter skipped tasks out at the SQL level.
Will add a note to the OP
I thought we'd documented this somewhere, but presumable skip mode will need the ability to set which (if any) optional outputs are satisfied by the skipped task? (And I feel that simulation mode might as well do that too?
skip mode will need the ability to set which (if any) optional outputs are satisfied by the skipped task?
This is written up in the proposal document.
I feel that simulation mode might as well do that too?
Simulation mode currently only generates the :succeeded and :failed outputs, no plans to change that at present.
If you mark a task to be skipped, but it has an xtrigger prerequisite, will it still need to wait fit the xtrigger to be successful before it runs? Or can you also skip the xtrigger?
I also wondered about the run length. Will it have a default length of 10 seconds like simulation, and how will that be controlled? I'm hoping the length will be as close to 0 as possible given its skipping instead of simulating.
With event handlers, can you specify different handlers when skipping than normally, or how will the information provided to the handler help differentiate skip versus normal runs?
Final question for today, will skipping add extra load to the server if there is a massive number of tasks being skipped (recognise the answer may change depending on if you have disabled event handling)?
If you mark a task to be skipped, but it has an xtrigger prerequisite, will it still need to wait fit the xtrigger to be successful before it runs? Or can you also skip the xtrigger?
I would have thought that it makes sense to satisfy any xtriggers when you set a task mode to skipped. Thank you for bringing that to my attention for discussion with @oliver-sanders .
I also wondered about the run length. Will it have a default length of 10 seconds like simulation, and how will that be controlled? I'm hoping the length will be as close to 0 as possible given its skipping instead of simulating.
Zero. Definitely zero, and we're not planning on making it customizable. Use simulation mode for that! (n.b. You cannot currently broadcast to simulation mode, but I have a PR for that)
With event handlers, can you specify different handlers when skipping than normally, or how will the information provided to the handler help differentiate skip versus normal runs?
The proposal has the idea that event handlers will be turned off in skip mode, unless you set [runtime][task][simulation]disable task event handlers = False. There is no provision for alternative event handlers. Can you explain why you want them and what you would have them do?
Final question for today, will skipping add extra load to the server if there is a massive number of tasks being skipped (recognise the answer may change depending on if you have disabled event handling)?
Yes, potentially, but the extra code run by the main loop for simulation (and skip will be based on this) is pretty lightweight - I haven't got evidence, but I think you'd need to be skipping colossal numbers of tasks for this to be noticeable. I will have a quick check though. This might be a good reason not to allow alternative event handlers.
I would have thought that it makes sense to satisfy any xtriggers when you set a task mode to skipped. Thank you for bringing that to my attention for discussion with @oliver-sanders .
I would agree with you as I wouldn't think people want to wait for xtriggers which may be a ways off. I can just see an edge case where one xtrigger is used for two tasks, and only one is skipped.
Zero. Definitely zero, and we're not planning on making it customizable.
Perfect.
With event handlers, can you specify different handlers when skipping than normally, or how will the information provided to the handler help differentiate skip versus normal runs?
The proposal has the idea that event handlers will be turned off in skip mode, unless you set
[runtime][task][simulation]disable task event handlers = False. There is no provision for alternative event handlers. Can you explain why you want them and what you would have them do?
I have not thought much about this, just thinking aloud. If you hypothetically have a handler that runs on any status change, and sends that information to a global reporting database or to external systems outside of your security area so they can do what they need to do, an event may be useful for them to still receive a message from a broker stating it was skipped or something so they can do appropriate actions on their end.
I can just see an edge case where one xtrigger is used for two tasks, and only one is skipped.
I think that the skipping of the xtrigger will be done by the task so this shouldn't happen.
Yep:
- Xtriggers aren't a feature of skip mode (i.e. they shouldn't get run).
- Note, legacy clock-triggers are now translated into xtriggers at configure time.
- The use cases considered for skip mode are about cutting bits out of a workflow manually.
- E.G. cutting a cycle out manually in a catch-up scenario.
- Note, optional outputs are the preferred solution to graph branching which can by used to achieve similar results in an automated way.
- It's there as an easier alternative to
cylc remove <old-cycle>/*; cylc trigger <new-cycle>/<start-tasks>
- There should not be any sleep or any way to configure sleep.
- By default event handlers will be turned off.
- We haven't considered providing the run mode to event handlers, but could add a template variable along the lines of
event handlers = myhandler %(workflow)s %(cycle)s %(task)s %(run_mode)s
Sorry, spoke too soon, xtriggers will be run in skip mode.
Skip mode applies only to a task's execution, it's essentially a dummy runtime environment which does nothing but yield the expected (or configured) outputs. It should be possible to configure a cycle to skip ahead of time in order to allow the workflow to continue after it, without this causing anything to run through the disabling of clock triggers.
Sorry, spoke too soon, xtriggers will be run in skip mode.
Skip mode applies only to a task's execution, it's essentially a dummy runtime environment which does nothing but yield the expected (or configured) outputs. It should be possible to configure a cycle to skip ahead of time in order to allow the workflow to continue after it, without this causing anything to run through the disabling of clock triggers.
This could lead to stalling, unless xtrigger can be skipped. If a work flow has an xtrigger which will never be satisfied say data availability checking, so you skip the cycle, it won't actually skip? I know you could do skip, then set outputs on all tasks with xtriggers, and then remove those tasks from the graph, but that is much more verbose a set of instructions to perform an actual skip than should be required in my opinion.
Note that this "skip mode" is intended only for operators to respond to real-world failures that the workflow cannot respond to automatically.
I'm not sure what use cases you have in mind.
Note that this "skip mode" is intended only for operators to respond to real-world failures that the workflow cannot respond to automatically.
I'm not sure what use cases you have in mind.
Yes, this is what we are doing in cylc7.
- Model
- Postprocessing
- Ensemble statistics
Due to reason say major hpc outage which means we have limited resources, the model needs to be skipped to allow more critical models space. So we trigger a script to skip all the individual work flows relating to the model.
Postprocessing uses a data availability xtrigger to check if it can start running, checking the Model. Similar for ensemble statistics it checks for certain data existing from Postprocessing before it runs. We do not use suite state xtrigger because that isn't useful when the db are on different servers and we want to be able to run non production Postprocessing fit development without also running the Model, or just checks for production data existing for example.
Your current plans would work (I think) for suite state xtriggers, but there are reasons that may not be usable.