temporal
temporal copied to clipboard
Force complete activity when it is retrying
Is your feature request related to a problem? Please describe. A bug in activity could result in incorrect return type causing another activity to fail continuously. Provide a mechanism to force complete an activity in retry without restarting the workflow.
Describe the solution you'd like RespondActivityTaskCompletedById api does not support retry attempt as input argument. Also need a way to allow completion when activity is backing off and not started at all.
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.
+1
+1
+1 For me, it would be a convenient way to “repair” a single blocked WF if an activity is periodically failing. In a way, activity complete could allow to “repair forward” (skip over a failing activity), like reset already allows to “repair backwards” (go back to an earlier state of the workflow execution).
Of course it is notbest practice to (manually) repair workflows all the time, but in some edge cases and incidents it would be great to have the possibility. Redeploying an updated activity worker might not always be a solution.
@mfateev was also supporting this idea.
Eventually, this could even be integrated into the WebUI, so that e.g. also a support engineer could repair a workflow.
similarly, a tctl workflow complete
command might also be handy to complete whole (child) workflows, what do you think?
Hi, I am interested in working on this issue.
Based on my understanding, if a workflow has multiple activities, the failure from one of them will result in the following activities' failure. Currently, the only way to fix this is to start the workflow from the beginning. We want to introduce a mechanism to retry the first failed activity when we find it is incorrect to prevent the cascading.
Is my understanding above correct? If so, how this is different from the retry options when we start a workflow with an activity here? We can handle the incorrect output from an activity with the retry options above.
Also, a way to reproduce this would be very helpful. Thank you!
This is fairly easy to add.
First we need to understand how to expose this in the API.
I would add a bool skip_started_check
or bool force
field in the RespondActivityTaskCompletedByIdRequest
message and the other RPCs that resolve an activity (RespondActivityTaskFailedByIdRequest
, RespondActivityTaskCanceledByIdRequest
).
Then we need to relax this condition if the flag is set (in all corresponding APIs).
If you want to take this on, you should start by making a PR to the https://github.com/temporalio/api repo and if this is accepted, you can continue to implement in the server (this) repo.
Hi @bergundy, thank you for your reply.
May I know if my understanding above is correct regarding this issue, and how this is different from the retry options when we start a workflow with an activity here?
Thank you in advance for your guidance.
This issue is for allowing completing and failing activities that are currently backing off. Seems like that's not what you want @alexseedkou based on your comment here:
Based on my understanding, if a workflow has multiple activities, the failure from one of them will result in the following activities' failure. Currently, the only way to fix this is to start the workflow from the beginning. We want to introduce a mechanism to retry the first failed activity when we find it is incorrect to prevent the cascading.
IIUC, you could reset the workflow to just before the activity was scheduled. Does that address your need? Feel free to tag me on the Temporal community Slack to continue the discussion.
An update on this issue:
Team has discussed this issue internally and decided to change the server behavior to accept activity completions even if the activity is currently backing off by default/
We'll need to update API and SDKs documentation to reflect the fact that RespondActivityTaskCompleted
and RespondActivityTaskCompletedById
have different behaviors.