samples-go icon indicating copy to clipboard operation
samples-go copied to clipboard

[Feature Request] Sample Update: Robust session retries

Open askreet opened this issue 1 year ago • 1 comments

The fileprocessing sample shows how a session can be retried within the workflow task. In my testing I stumbled upon some issues with the approach:

  1. It doesn't honor workflow cancelation. Workflow cancelation will be handled as an error on the activity execution and retried.
  2. It doesn't honor legitimate errors, such as a NonRetryableError from an activity.

I came to the conclusion that a more robust solution is to execute all the steps of a session in some function, and then check the result of that function for (in this order):

  • errors.Is(ctx, workflow.ErrIsCancelled) - where ctx is the original workflow context: In this case CompleteSession must be called, along with any other required cleanup (e.g., local files). A retry should not be attempted as someone wants the workflow to stop.
  • errors.Is(sessCtx, workflow.ErrIsCancelled) - In this case the session has been canceled due to heartbeat timeout and should be retried. No cleanup can be done as the session cannot spawn new activities.
  • errors.Is(err, workflow.ErrSessionFailed) - Same, but slightly different cause (attempt to run activity vs. already running activity against failed/expired session).
  • Any other error, or no error - should be propagated up the stack without attempting to retry the session. Calling CompleteSession is required.

Assuming the above is correct, I think the fileprocessing example is a great place to demonstrate this technique, and a reference from the code Go SDK docs would be a big win. I'd be willing to contribute the change to the example if that's interesting to you all.

askreet avatar Mar 30 '24 12:03 askreet

The fileprocessing sample does not distinguish between errors, this is called out in the sample https://github.com/temporalio/samples-go/blob/main/fileprocessing/workflow.go#L24. It is intended to show basic usage of the session API, not how to handle errors in a workflow. For a more detailed description of how to handle failures related to sessions we added https://github.com/temporalio/samples-go/blob/main/session-failure/workflow.go. In general we try to keep our samples focused to single concepts and leave integrating concepts to example applications.

Quinn-With-Two-Ns avatar Mar 30 '24 19:03 Quinn-With-Two-Ns

Ah thanks @Quinn-With-Two-Ns this is awesome, not sure how we missed that second example in our original research. I'll close this out.

askreet avatar Mar 31 '24 12:03 askreet