temporal icon indicating copy to clipboard operation
temporal copied to clipboard

Timeout on complete when invalid task token passed

Open jontro opened this issue 1 year ago • 3 comments

Expected Behavior

The client should throw an exception immidiatley

Actual Behavior

Calling activityCompletionClient.complete with an invalid task token, gives a timeout after a while instead of an error.

Steps to Reproduce the Problem

    workflowClient.newActivityCompletionClient().complete(
        "bogus".toByteArray(), ""
    )
  1. Timeout occurs after a while with the following exception: Caused by: io.grpc.StatusRuntimeException: UNKNOWN: unexpected EOF

Specifications

jontro avatar Mar 14 '23 21:03 jontro

This is not a bug of the temporal server. When we debugged, the server realized the invalid task token and returned the error immediately to the client. However, by design the grpcRetryer implementation on the sdk-java catches this error, keeps retrying until expiration and finally prints out the error after a couple minutes. This implementation doesn't exist in other sdks such as: sdk-go, sdk-typescript and sdk-python.

phongcao avatar May 30 '23 15:05 phongcao

Checking the Go SDK it also returns a timeout so this is not a Java excl. If the server is returning an UNKNOWN status code then most SDKs should consider that retry-able.

Quinn-With-Two-Ns avatar Oct 22 '23 02:10 Quinn-With-Two-Ns

Confirmed all SDKs consider UNKNOWN retry-able. So all SDKs will timeout on an invalid task token.

https://github.com/temporalio/sdk-typescript/blob/4dcc82aa48fc285119978bd2a93acc3a0e9fd231/packages/client/src/grpc-retry.ts#L100

https://github.com/temporalio/sdk-core/blob/45d2bc997fd25bf24d347b04d519e7279851aea4/client/src/retry.rs#L22

I think the simplest fix is for server to return a different status code like INVALID_ARGUMENTS in this case.

Quinn-With-Two-Ns avatar Oct 22 '23 22:10 Quinn-With-Two-Ns