Context timeout is halved on client.ExecuteWorkflow

Open taonic opened this issue 2 years ago • 1 comments

Expected Behavior

When a context with timeout (e.g. 4s as duration) is passed into client.ExecuteWorkflow, and it gets a client-side context deadline exceeded error, the method should timeout and return after the designated duration.

Actual Behavior

Currently, the client.ExecuteWorkflow times out after half of the designated duration, 2s instead of 4s as an example.

Code to reproduce:

	timeout := 4 * time.Second
	ctx, cancel := context.WithTimeout(context.Background(), timeout)
	defer cancel()

	workflowOptions := client.StartWorkflowOptions{
		ID:        workflowID,
		TaskQueue: "cancel-activity",
	}

	then := time.Now()
	fmt.Println("ctx before execute: ", ctx)

	we, err := c.ExecuteWorkflow(ctx, workflowOptions, cancellation.YourWorkflow)
	if err != nil {
		fmt.Println("ctx after execute: ", ctx)
		fmt.Printf("Time elapsed: %v", time.Since(then))
		log.Fatalf("Unable to execute workflow %+v\n", err)
	}

Output:

ctx before execute:  context.Background.WithDeadline(2023-05-16 22:12:16.653157 +1000 AEST m=+10.639604834 [3.999887833s])
ctx after execute:  context.Background.WithDeadline(2023-05-16 22:12:16.653157 +1000 AEST m=+10.639604834 [1.998066541s])
Time elapsed: 2.001958166s
2023/05/16 22:12:14 Unable to execute workflow context deadline exceeded
exit status 1

We suspect the behaviour was originally introduced as part of the dynamic retry policy for Cadence server: https://github.com/temporalio/sdk-go/commit/646d94de74a547ee7fa1c649142a9839a52d0a6e#diff-e0628cecb8a238a2d362a0ce845f7f8a8ff26d38942b471d09d7e079b5af55e7R110

Specifications

Version: v1.22.2

May 17 '23 07:05 taonic

To add clarification, we halve the user-provided timeout so we can retry within it, but if server is down we don't retry we get user context deadline exceeded context error (not the server-side gRPC deadline exceeded error code) so it is effectively half. Making RPC timeout half the context timeout is fine, we just need to make sure to only halve the RPC timeout not the context timeout.

May 17 '23 12:05 cretz