sdk-go
sdk-go copied to clipboard
Sticky query invalid state machine transition error
If temporal server think sdk has the workflow cache and sends a sticky query (whether the query is strongly consistent doesn't matter, as long as the query is dispatched via matching through a query task) to the sdk but the worker doesn't actually have the workflow cache, sdk will call getWorkflowExecutionHistory to rebuild the workflow state.
Query/workflow task contains a previousStartedEventID which is used to determine which part of the history is in replay mode. When query task calls getWorkflowExecutionHistory to get history, however, the history may already advanced and contains more events than when the query task is first dispatched. This means the previousStartedEventID in the original query task will be outdated.
Existing SDK implementation will continue to use that outdated previousStartedEventID to run the query task and this will lead to invalid state transition error. (More specifically, sdk will want to move the state machine to Init state directly from Created State, skipping the Sent state.) Check below for a test that can repro the issue.
We don't have the same issue for workflow task because there'll be only one pending workflow task, so the history can't be advanced when we are processing it.
Some idea for fixing the issue:
- For query tasks, process all events in reply mode
- Include a nextEventID in query task and truncate history beyond that point (this is the fix done in Cadence)
- getWorkflowExecutionHistory also return previousStartedEventID ? not sure.
So far it looks like with approach 1, the entire fix can be done on sdk side. 2 and 3 will involve server change as well. Please let server team know if approach 1 can't work and additional support is needed from server side.
Expected Behavior
Successfully return a query result
Actual Behavior
Invalid state machine transition error
Steps to Reproduce the Problem
A sample test that can expose the bug (and a potential fix) can be found at https://github.com/yycptt/sdk-go/commit/7bd7016ea653999a6483b99f227653252cb2b33f Running the test directly will give the following error message
2022/03/15 18:05:15 WARN Failed to process workflow task. Namespace default TaskQueue query-task-cache-evicted-tl WorkerID test-worker-identity WorkflowType query-task-cache-evicted-workflow WorkflowID query-task-cache-evicted-workflow-id RunID query-task-cache-evicted-workflow-run-id Attempt 0 Error invalid state transition: attempt to handleInitiatedEvent, CommandType: Activity, ID: 10, state=Created, isDone()=false, history=[Created]
Specifications
- Version:
- Platform: