cadence-client
cadence-client copied to clipboard
Clean shutdown to activities to fail fast
Customer reported:
A question regarding worker shutdown. Very often we deploy code to services which act as the worker. That will require shutting down the worker, and bringing the service down.
In that case, we want to give grace to currently running activities to complete instead of failing (which will require retrying). I tried to configure "WorkerStopTimeout" as a way to give a chance to currently running short-lived activities to "finish" and report the result. But I found out that while the worker will now wait for the activities to complete, it won't report their result to the cadence server.
So, what is the recommended way to shutdown?
I double checked. We haven't implemented it yet, so it should be a feature request
Thanks. I'm the reporter of this issue.
I'm wondering, what is the current intended purpose of the worker graceful shutdown, if not to allow activities to complete and report their results?
Thanks. I'm the reporter of this issue.
I'm wondering, what is the current intended purpose of the worker graceful shutdown, if not to allow activities to complete and report their results?
Hi Thanks for reaching out!
There is no graceful shutdown currently so no intended to not doing anything. It's just a missing feature. And I don't understand the later part that "not to allow activities to complete and report their result". Activities are always able to complete and report results until the worker is shutdown.
Thanks. I'm the reporter of this issue. I'm wondering, what is the current intended purpose of the worker graceful shutdown, if not to allow activities to complete and report their results?
Hi Thanks for reaching out!
There is no graceful shutdown currently so no intended to not doing anything. It's just a missing feature. And I don't understand the later part that "not to allow activities to complete and report their result". Activities are always able to complete and report results until the worker is shutdown.
Gotcha. My intuition was that the graceful shutdown drains current running activities by giving them the grace to complete and report. Shutting down the worker is something we do frequently so it affects how we define our retry policies.
Thanks for clarifying!
Does this mean that the only way to get an activity that was in progress at the time that a worker shut down to retry is by using a heartbeat for the activity and having a retry policy on the activity to allow it to continue after the heartbeat timeout? Calling Stop on the worker will not let Cadence know that any activities that the worker still had running at the time it was stopped need to be retried (depending on policy)?