cadence-client icon indicating copy to clipboard operation
cadence-client copied to clipboard

Clean shutdown to activities to fail fast

Open mfateev opened this issue 6 years ago • 5 comments

Customer reported:

A question regarding worker shutdown. Very often we deploy code to services which act as the worker. That will require shutting down the worker, and bringing the service down.

In that case, we want to give grace to currently running activities to complete instead of failing (which will require retrying). I tried to configure "WorkerStopTimeout" as a way to give a chance to currently running short-lived activities to "finish" and report the result. But I found out that while the worker will now wait for the activities to complete, it won't report their result to the cadence server.

So, what is the recommended way to shutdown?

mfateev avatar Aug 07 '19 14:08 mfateev

I double checked. We haven't implemented it yet, so it should be a feature request

longquanzheng avatar Aug 12 '19 16:08 longquanzheng

Thanks. I'm the reporter of this issue.

I'm wondering, what is the current intended purpose of the worker graceful shutdown, if not to allow activities to complete and report their results?

yarelm avatar Aug 13 '19 05:08 yarelm

Thanks. I'm the reporter of this issue.

I'm wondering, what is the current intended purpose of the worker graceful shutdown, if not to allow activities to complete and report their results?

Hi Thanks for reaching out!

There is no graceful shutdown currently so no intended to not doing anything. It's just a missing feature. And I don't understand the later part that "not to allow activities to complete and report their result". Activities are always able to complete and report results until the worker is shutdown.

longquanzheng avatar Aug 13 '19 06:08 longquanzheng

Thanks. I'm the reporter of this issue. I'm wondering, what is the current intended purpose of the worker graceful shutdown, if not to allow activities to complete and report their results?

Hi Thanks for reaching out!

There is no graceful shutdown currently so no intended to not doing anything. It's just a missing feature. And I don't understand the later part that "not to allow activities to complete and report their result". Activities are always able to complete and report results until the worker is shutdown.

Gotcha. My intuition was that the graceful shutdown drains current running activities by giving them the grace to complete and report. Shutting down the worker is something we do frequently so it affects how we define our retry policies.

Thanks for clarifying!

yarelm avatar Aug 13 '19 07:08 yarelm

Does this mean that the only way to get an activity that was in progress at the time that a worker shut down to retry is by using a heartbeat for the activity and having a retry policy on the activity to allow it to continue after the heartbeat timeout? Calling Stop on the worker will not let Cadence know that any activities that the worker still had running at the time it was stopped need to be retried (depending on policy)?

willgorman avatar Feb 02 '21 20:02 willgorman