temporal icon indicating copy to clipboard operation
temporal copied to clipboard

Workflow Pause / Unpause

Open yiminc opened this issue 3 years ago • 14 comments
trafficstars

There are many uses cases where the ability to pause a workflow would be very much appreciated.

Pause a workflow would mean no more workflow task would be scheduled for that workflow. The unpause would mean create new workflow task if one is needed but not present because it was paused. The tricky part is to replicate paused state to passive side.

We can have a default pause policy that apply to workflows that experience continued workflow task failure, for example due to bug in workflow code.

We could also enable pausing on activity failure as part of retry policy.

Paused workflow should be visible through visibility API, and some kind of batch operation to support unpause them on demand.

yiminc avatar Jun 17 '22 19:06 yiminc

+1

Zonalds avatar Nov 25 '22 09:11 Zonalds

+1

peiminmin avatar Dec 19 '22 09:12 peiminmin

Any changes to this?

Zonalds avatar Mar 04 '23 06:03 Zonalds

@yiminc We have a couple of engineers on our team looking to contribute to this feature and would like to understand if this is work-in-progress. We would love see how we can collaborate with you on this feature

@mfateev

NarmathaBala avatar May 05 '23 19:05 NarmathaBala

This would be a really nice feature for us as well. In my mind "pause" would suspend all timeouts and prevent any progression through the workflow code (preventing retries of any current activities as well).

This is especially useful in scenarios where a workflow encounters errors and we would like to push a fix to the worker code without worrying about timeouts on the workflow (basically we can prevent the workflow timing out and cleaning up after itself, usually this can save use recomputing anything etc.).

Perhaps the issues with this could be mitigated by us adopting a different architecture. But currently our workflow provision infrastructure for themselves to run computationally heavy work on (in particular with GPUs). Ideally we don't want to waste this work by having our workflow timeout. There are ways for us to work around this of course (saving work to S3, recovery workflows and methods to resume etc.) but in general it would be much cleaner if we would simply pause a workflow that is encountering errors, scale down the compute (IE. scale the ECS service to 0), debug and fix the problem, push the change and update the service and scale back up again.

gmintoco avatar Jun 20 '23 13:06 gmintoco

Does a simpler pause mechanism exist? I don't mind if timeouts are not paused as well, e.g if there is an Activity that is supposed to run in 5 minutes and the workflow is paused for 5 minutes I don't mind if the activity runs as soon as the workflow is unpaused. Is this possible with the current framework?

Gabrn avatar Dec 24 '23 08:12 Gabrn

Hi @yiminc ,

Do you have any update or is there a development timeline decided on this?

We could really use this feature and any update on the timeline or a tentative release date helps a lot.

prithage avatar Apr 04 '24 08:04 prithage