sdk-go icon indicating copy to clipboard operation
sdk-go copied to clipboard

TestWorkflowEnvironment.SignalExternalWorkflow data race on signal data

Open Drahflow opened this issue 1 year ago • 1 comments

Expected Behavior

Testing a workflow which signals external workflows should work, even if the signal data is modified later in the workflow.

Actual Behavior

It does not (reliably), specifically data-races such as this can be observed (occasionally):

==================
WARNING: DATA RACE
Read at 0x00c000515190 by goroutine 113:
  reflect.typedmemmove()
      /usr/lib/go-1.19/src/runtime/mbarrier.go:178 +0x0
[...]
  github.com/stretchr/testify/mock.Arguments.Diff()
      /home/drahflow/go/pkg/mod/github.com/stretchr/[email protected]/mock/mock.go:883 +0x18e
  github.com/stretchr/testify/mock.(*Mock).findExpectedCall()
      /home/drahflow/go/pkg/mod/github.com/stretchr/[email protected]/mock/mock.go:355 +0x146
  github.com/stretchr/testify/mock.(*Mock).MethodCalled()
      /home/drahflow/go/pkg/mod/github.com/stretchr/[email protected]/mock/mock.go:460 +0xb9
  go.temporal.io/sdk/internal.(*testWorkflowEnvironmentImpl).SignalExternalWorkflow.func2()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_workflow_testsuite.go:2118 +0x329

Previous write at 0x00c000515190 by goroutine 97:
  github.com/payrails/demo/pkg/merchant/workflows/consumer_checkout.cancelPaymentMethods.func1.2()
      /work/pkg/merchant/workflows/consumer_checkout/main.go:829 +0xfc
[...]

Goroutine 113 (running) created at:
  go.temporal.io/sdk/internal.(*testWorkflowEnvironmentImpl).SignalExternalWorkflow()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_workflow_testsuite.go:2115 +0x396
  go.temporal.io/sdk/internal.signalExternalWorkflow()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/internal/workflow.go:1077 +0x46e
  go.temporal.io/sdk/internal.(*workflowEnvironmentInterceptor).SignalExternalWorkflow()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/internal/workflow.go:1040 +0xc4
  go.temporal.io/sdk/internal.SignalExternalWorkflow()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/internal/workflow.go:1035 +0x20d
  go.temporal.io/sdk/workflow.SignalExternalWorkflow()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/workflow/workflow.go:221 +0x171
  github.com/payrails/demo/pkg/merchant/workflows/consumer_checkout.(*Channel[...]).Reply()
      /work/pkg/merchant/workflows/consumer_checkout/channels.go:91 +0x66
  github.com/payrails/demo/pkg/merchant/workflows/consumer_checkout.waitForFinalize.func1()
      /work/pkg/merchant/workflows/consumer_checkout/main.go:725 +0x271
  github.com/payrails/demo/pkg/merchant/workflows/consumer_checkout.(*Channel[...]).OnSignal.func1()
      /work/pkg/merchant/workflows/consumer_checkout/channels.go:57 +0x11a
  go.temporal.io/sdk/internal.(*selectorImpl).Select.func2.1()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_workflow.go:1158 +0xa1
  go.temporal.io/sdk/internal.(*selectorImpl).Select()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_workflow.go:1253 +0x120d
[...]

Goroutine 97 (running) created at:
  go.temporal.io/sdk/internal.(*dispatcherImpl).NewCoroutine()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_workflow.go:981 +0x65a
  go.temporal.io/sdk/internal.(*workflowEnvironmentInterceptor).Go()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_workflow.go:299 +0x86
  go.temporal.io/sdk/internal.newDispatcher()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_workflow.go:598 +0x282
  go.temporal.io/sdk/internal.(*syncWorkflowDefinition).Execute()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_workflow.go:488 +0x1c4
  go.temporal.io/sdk/internal.(*testWorkflowEnvironmentImpl).executeWorkflowInternal.func1()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_workflow_testsuite.go:486 +0xc2
  go.temporal.io/sdk/internal.(*testCallbackHandle).processCallback()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_workflow_testsuite.go:732 +0xfb
  go.temporal.io/sdk/internal.(*testWorkflowEnvironmentImpl).startMainLoop()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_workflow_testsuite.go:672 +0x2a4
  go.temporal.io/sdk/internal.(*testWorkflowEnvironmentImpl).executeWorkflowInternal()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_workflow_testsuite.go:508 +0x4f1
  go.temporal.io/sdk/internal.(*testWorkflowEnvironmentImpl).executeWorkflow()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_workflow_testsuite.go:452 +0x1cc
  go.temporal.io/sdk/internal.(*TestWorkflowEnvironment).ExecuteWorkflow()
      /home/drahflow/go/pkg/mod/go.temporal.io/[email protected]/internal/workflow_testsuite.go:516 +0xf84
[...]
==================

[Edited for brevity and NDAs].

The code for the conflicting write at /work/pkg/merchant/workflows/consumer_checkout/main.go:829 +0xfc is a standard member-access to a struct which was sent as part of the signal payload. This works on real execution, as the signal data is serialized and sent before the workflow processing continues (as far as I can see), but in TestWorkflowEnvironment.SignalExternalWorkflow, a new go-routine is spawned, which tries to look up the correct mock of the signal receiver concurrently with the rest of the workflow proceeding (and modifying the data of the signal currently being processed): https://github.com/temporalio/sdk-go/blob/754b253bb196d69c64317af1ed28fa85a74d584d/internal/internal_workflow_testsuite.go#L2068

Steps to Reproduce the Problem

  1. Make workflow which sends signal to other workflow and immediately proceeds to modify member fields of the signal payload
  2. Build a workflow test and mock the signal receiver
  3. Run the workflow test very often with go test -race

Specifications

  • Version: go.temporal.io/sdk v1.15.0
  • Platform: linux, x86_64, go version go1.19.1 linux/amd64

Drahflow avatar Sep 28 '22 14:09 Drahflow

Thanks! This is similar to #908. Some of the test environment was not developed in a concurrency safe way. We will fix.

cretz avatar Sep 28 '22 14:09 cretz