Terminate Workflow cancels tasks on stale data
Describe the bug When a workflow is parent workflow is terminated sub workflows can be left in a running state if there is a concurrently running decide operation that is launching sub workflows.
Details Conductor version: 3.21.5 Persistence implementation: Postgres Queue implementation: Dynoqueues Lock: Redis
To Reproduce Steps to reproduce the behavior:
- Have a large DynamicForkJoin start a large number of SubWorkflows (this may also require an external input payload to make it slow to start)
- Click on 'Terminate' before all the workflows have started
- SubWorkflows started after the Termination was requested but before the Decide has completed starting the SubWorkflows will still be Running.
- Query all the workflows by correlation id and see that some sub workflows are still running while others have been Terminated
Expected behavior All workflows sub workflows should be terminated
Additional context This appears to be caused by the TerminateWorkflow loading the task data outside of the execution lock. Thus when it goes to cancel the non-terminal tasks, the SubWorkflow tasks appears to still be in the SCHEDULED state with no subWorkflowId and so the SubWorkflow is not cancelled.