sdk-go icon indicating copy to clipboard operation
sdk-go copied to clipboard

Support simulating race conditions during signal draining

Open joshmsmith opened this issue 8 months ago • 4 comments

I was helping a customer writing tests for go workflows processing signals - I didn't know how to get complete test coverage to ensure the signal draining code gets covered tested. Asked @mfateev: he suggested creating this feature request to support simulating race conditions during signal draining.

Describe the solution you'd like Would love a way for Go SDK to support simulating race conditions during signal draining. Perhaps useful for other SDKs.

Describe alternatives you've considered Skipping testing this race condition is probably good enough for now.

joshmsmith avatar Mar 27 '25 21:03 joshmsmith

Can you elaborate what you mean by a "race condition" during signal draining? Also why is this only specific to the Go SDK?

Quinn-With-Two-Ns avatar Mar 27 '25 21:03 Quinn-With-Two-Ns

I can share what I know, hoping Max can share more context. The struggle I was having with the customer was that trying to get 100% code coverage - including the signal draining part - is hard to do because it's a tight time window that the signal drain needs to run, and hard at least for me to get signals in to trigger the draining in a test. I think the goal here is to be able to send signals to be drained in a test. I think that's why it's harder in the go SDK, in other SDKs signals work differently. Does that help?

joshmsmith avatar Mar 28 '25 14:03 joshmsmith

it's a tight time window that the signal drain needs to run, and hard at least for me to get signals in to trigger the draining in a test.

In the test environment you have full control over the timing and when things advance so if you know the workflow code you should be able to cause a race. Maybe an example workflow you had difficulty testing would help illustrate the problem? I am not sure how the test environment could know when to inject a signal to cause a race since it can't read your workflow code or predicted how it will execute.

I think the goal here is to be able to send signals to be drained in a test. I think that's why it's harder in the go SDK, in other SDKs signals work differently. Does that help?

They are a bit different, but I don't know if that really matters here. Signals are sent by a queue in Go, but the race condition exists in all SDKs and is not easily testable in any SDK so I don't think this is Go specific problem or harder in Go then other SDKs. Might be easier in Go since the test environment in Go gives more control over execution then other SDKs do.

One idea is maybe the SDK can warn you if you do not check any signal channel in the last workflow task? If you are not checking a signal channel in the last workflow task you risk dropping a signal. How does that sound?

Quinn-With-Two-Ns avatar Mar 28 '25 15:03 Quinn-With-Two-Ns

I think @mfateev had some other ideas, maybe he can share them here?

One idea is maybe the SDK can warn you if you do not check any signal channel in the last workflow task? If you are not checking a signal channel in the last workflow task you risk dropping a signal. How does that sound?

I like that idea a lot actually.

joshmsmith avatar May 07 '25 13:05 joshmsmith