temporal
temporal copied to clipboard
Support pausing/resuming Temporal services' work
Is your feature request related to a problem? Please describe.
Scenario: Performing a blue/green as part of a rollout of Temporal services themselves.
At Netflix, our model of a red/black is that when a new service version is deployed, the old one is put into a disabled state, rather than destroying old infrastructure. In this disabled state, a service is still "hot", but not performing work (serving requests, consuming queues or producing events, and so-on).
At a glance for the frontend
service, deregistering the service from the load balancer appears to be adequate.
Describe the solution you'd like
Ideal: I'd like an extension point that would allow us to write a listener into the services themselves to actuate an enable/disabled state for the process, similar to what exists in WorkerFactory
via suspendPolling
and enablePolling
.
Alternative: An API to send a signal to the service process to enable/disable.
Describe alternatives you've considered
We're running Temporal, for the time being, on EC2 VMs. It's looking like what we'll need to do is write a sidecar that subscribes to Discovery status change events, which then send sigstop
and sigcont
on UP/DOWN. This solution is fine until we move our workload onto Titus, at which point the only solution would be to teardown the service and re-provision in the case of a rollback.
Additional context
This disabled state, as opposed to destroying infrastructure (killing containers, terminating an EC2 instance, etc) is a convention that consciously trades cost for operational agility: We would prefer to keep infrastructure in case of a rollback, we can do so as quickly as possible without having to wait for capacity allocation, service startup, warming, and so-on.
A relatively simple feature would be pausing/unpausing a specific workflow or activity task queue through the service API.
Hi.. is there any update on this feature? One of our use cases needs pausing workflow and resuming it later.
Looking forward to this feature too.
This feature would be very useful for our implementation, any updates?
Close this one in favor of https://github.com/temporalio/temporal/issues/3006