conductor icon indicating copy to clipboard operation
conductor copied to clipboard

Terminate Workflow cancels tasks on stale data

Open lbestatlas opened this issue 1 year ago • 0 comments

Describe the bug When a workflow is parent workflow is terminated sub workflows can be left in a running state if there is a concurrently running decide operation that is launching sub workflows.

Details Conductor version: 3.21.5 Persistence implementation: Postgres Queue implementation: Dynoqueues Lock: Redis

To Reproduce Steps to reproduce the behavior:

  1. Have a large DynamicForkJoin start a large number of SubWorkflows (this may also require an external input payload to make it slow to start)
  2. Click on 'Terminate' before all the workflows have started
  3. SubWorkflows started after the Termination was requested but before the Decide has completed starting the SubWorkflows will still be Running.
  4. Query all the workflows by correlation id and see that some sub workflows are still running while others have been Terminated

Expected behavior All workflows sub workflows should be terminated

Additional context This appears to be caused by the TerminateWorkflow loading the task data outside of the execution lock. Thus when it goes to cancel the non-terminal tasks, the SubWorkflow tasks appears to still be in the SCHEDULED state with no subWorkflowId and so the SubWorkflow is not cancelled.

lbestatlas avatar Oct 17 '24 04:10 lbestatlas