samples-go
samples-go copied to clipboard
[Feature Request] Sample Update: Robust session retries
The fileprocessing sample shows how a session can be retried within the workflow task. In my testing I stumbled upon some issues with the approach:
- It doesn't honor workflow cancelation. Workflow cancelation will be handled as an error on the activity execution and retried.
- It doesn't honor legitimate errors, such as a NonRetryableError from an activity.
I came to the conclusion that a more robust solution is to execute all the steps of a session in some function, and then check the result of that function for (in this order):
- errors.Is(ctx, workflow.ErrIsCancelled) - where ctx is the original workflow context: In this case CompleteSession must be called, along with any other required cleanup (e.g., local files). A retry should not be attempted as someone wants the workflow to stop.
- errors.Is(sessCtx, workflow.ErrIsCancelled) - In this case the session has been canceled due to heartbeat timeout and should be retried. No cleanup can be done as the session cannot spawn new activities.
- errors.Is(err, workflow.ErrSessionFailed) - Same, but slightly different cause (attempt to run activity vs. already running activity against failed/expired session).
- Any other error, or no error - should be propagated up the stack without attempting to retry the session. Calling CompleteSession is required.
Assuming the above is correct, I think the fileprocessing example is a great place to demonstrate this technique, and a reference from the code Go SDK docs would be a big win. I'd be willing to contribute the change to the example if that's interesting to you all.
The fileprocessing sample does not distinguish between errors, this is called out in the sample https://github.com/temporalio/samples-go/blob/main/fileprocessing/workflow.go#L24. It is intended to show basic usage of the session API, not how to handle errors in a workflow. For a more detailed description of how to handle failures related to sessions we added https://github.com/temporalio/samples-go/blob/main/session-failure/workflow.go. In general we try to keep our samples focused to single concepts and leave integrating concepts to example applications.
Ah thanks @Quinn-With-Two-Ns this is awesome, not sure how we missed that second example in our original research. I'll close this out.