sdk-go icon indicating copy to clipboard operation
sdk-go copied to clipboard

Activity cancel error returned when not initiated server side is rejected by server

Open cretz opened this issue 2 years ago • 3 comments

Expected Behavior

When a worker stops while an activity is running and that context is cancelled and bubbled out as an activity error, that activity attempt should be recorded as failed server side. This is how it works in most SDKs today.

Actual Behavior

Today if you stop a worker, the activity tries to return to the server the cancelled error but the server fails the activity task completion with:

unable to mark activity as canceled without activity being request canceled first

I think we need to send 1) only send activity cancelled if the server requested cancel (otherwise wrap as application error if canceled error received), and 2) write a test proving that worker stop during a non-heartbeating activity does the next attempt on the next worker.

cretz avatar Jul 06 '23 19:07 cretz

This maybe could be done at the same time as #1086

cretz avatar Jul 06 '23 19:07 cretz

Am I correct in thinking the impact of this is just performance—that is, it takes longer overall to successfully complete the activity than it otherwise might?

My mental model of this is:

  1. Activity cancellation error gets sent to server
  2. Server fails the task completion, and therefore doesn't realize the activity was canceled
  3. Eventually the activity times out server-side (<-- this is where the extra time goes)
  4. Server retries the activity (hopefully on a new worker since the old one died)
  5. Eventually the activity completes (if nothing else goes wrong)

lmk what I got wrong. :)

josh-berry avatar Dec 26 '23 19:12 josh-berry

Yes, that seems correct. I think in most Core-based SDKs, we already convert non-server-requested-activity-cancellation errors (i.e. worker shutdown) to activity errors. I think we probably just confirm and do this in all SDKs.

cretz avatar Jan 03 '24 16:01 cretz