temporal
temporal copied to clipboard
Timeout on complete when invalid task token passed
Expected Behavior
The client should throw an exception immidiatley
Actual Behavior
Calling activityCompletionClient.complete with an invalid task token, gives a timeout after a while instead of an error.
Steps to Reproduce the Problem
workflowClient.newActivityCompletionClient().complete(
"bogus".toByteArray(), ""
)
- Timeout occurs after a while with the following exception:
Caused by: io.grpc.StatusRuntimeException: UNKNOWN: unexpected EOF
Specifications
- Version: v1.20.0
- Platform: amd64
This is not a bug of the temporal server. When we debugged, the server realized the invalid task token and returned the error immediately to the client. However, by design the grpcRetryer
implementation on the sdk-java catches this error, keeps retrying until expiration and finally prints out the error after a couple minutes.
This implementation doesn't exist in other sdks such as: sdk-go
, sdk-typescript
and sdk-python
.
Checking the Go SDK it also returns a timeout so this is not a Java excl. If the server is returning an UNKNOWN
status code then most SDKs should consider that retry-able.
Confirmed all SDKs consider UNKNOWN
retry-able. So all SDKs will timeout on an invalid task token.
https://github.com/temporalio/sdk-typescript/blob/4dcc82aa48fc285119978bd2a93acc3a0e9fd231/packages/client/src/grpc-retry.ts#L100
https://github.com/temporalio/sdk-core/blob/45d2bc997fd25bf24d347b04d519e7279851aea4/client/src/retry.rs#L22
I think the simplest fix is for server to return a different status code like INVALID_ARGUMENTS
in this case.