temporal icon indicating copy to clipboard operation
temporal copied to clipboard

Support pausing/resuming Temporal services' work

Open robzienert opened this issue 4 years ago • 3 comments

Is your feature request related to a problem? Please describe.

Scenario: Performing a blue/green as part of a rollout of Temporal services themselves.

At Netflix, our model of a red/black is that when a new service version is deployed, the old one is put into a disabled state, rather than destroying old infrastructure. In this disabled state, a service is still "hot", but not performing work (serving requests, consuming queues or producing events, and so-on).

At a glance for the frontend service, deregistering the service from the load balancer appears to be adequate.

Describe the solution you'd like

Ideal: I'd like an extension point that would allow us to write a listener into the services themselves to actuate an enable/disabled state for the process, similar to what exists in WorkerFactory via suspendPolling and enablePolling.

Alternative: An API to send a signal to the service process to enable/disable.

Describe alternatives you've considered

We're running Temporal, for the time being, on EC2 VMs. It's looking like what we'll need to do is write a sidecar that subscribes to Discovery status change events, which then send sigstop and sigcont on UP/DOWN. This solution is fine until we move our workload onto Titus, at which point the only solution would be to teardown the service and re-provision in the case of a rollback.

Additional context

This disabled state, as opposed to destroying infrastructure (killing containers, terminating an EC2 instance, etc) is a convention that consciously trades cost for operational agility: We would prefer to keep infrastructure in case of a rollback, we can do so as quickly as possible without having to wait for capacity allocation, service startup, warming, and so-on.

robzienert avatar Nov 19 '20 02:11 robzienert

A relatively simple feature would be pausing/unpausing a specific workflow or activity task queue through the service API.

mfateev avatar Feb 19 '22 23:02 mfateev

Hi.. is there any update on this feature? One of our use cases needs pausing workflow and resuming it later.

mathiarasan-e5 avatar Dec 14 '23 04:12 mathiarasan-e5

Looking forward to this feature too.

Wenfeng-GAO avatar Dec 18 '23 02:12 Wenfeng-GAO

This feature would be very useful for our implementation, any updates?

csimonsson avatar Mar 11 '24 10:03 csimonsson

Close this one in favor of https://github.com/temporalio/temporal/issues/3006

yiminc avatar Mar 16 '24 00:03 yiminc