dolphinscheduler icon indicating copy to clipboard operation
dolphinscheduler copied to clipboard

[Improvement][Dependent Task] Optimize sub-workflow dependency logic

Open reele opened this issue 7 months ago • 4 comments

Search before asking

  • [x] I had searched in the issues and found no similar feature requirement.

Description

This is an extended issue for #16768 and #16776 to solve the ​scheduling time dependency​ issue.

The main problem is that dependent tasks query WorkflowInstance by scheduleTime, but sub-workflows do not have this value.

Recursively finding the main workflow is not a good approach, and iterating through all levels of sub-workflows from the main workflow is also inefficient—both may cause database performance issues.

Therefore, there should always be a ​key or value​ to directly determine the time and start action that workflows have.

For this reason, I propose creating a new object called Action to record the original workflow start information.

erDiagram
  "[*Action]" {
    int id
    enum actionType
    date submittedTime
  }
  "WorkflowInstance" {
    int actionId FK
  }
  "TaskInstance" {
    int workflowInstanceId FK
  }

  "WorkflowInstance" }|--|| "[*Action]" : ""
  "TaskInstance" }|--|| "WorkflowInstance" : ""

Use Action addresses two issues:

  1. Reducing database pressure: The dependent node first filters actions by submittedTime, then queries workflow-instances using (testFlag, code, actionId). This reduces table scans and optimizes index usage (previously on testFlag, code, scheduleTime/startTime).
  2. Fixing dependency logic: Prevents null scheduleTime comparisons (in manual schedules), making the behavior more robust.

With the action object, the dependent node now query tasks always based on a fixed timestamp, not the node task's real startTime.

Are you willing to submit a PR?

  • [x] Yes I am willing to submit a PR!

Code of Conduct

reele avatar May 06 '25 04:05 reele

@ruanwenjun could you please offer some suggestions and guidance for this?

reele avatar May 08 '25 10:05 reele

Basically LGTM.

Right now, in the workflow instance metadata, we are missing the trigger context of the workflow instance, one workflow instance may be trigger multiple times, but we only store the latest once, it's better to add a new table t_ds_workflow_instance_action to store this information. The information should contain action type, action time, action user, action_context(include params). Then we can remove a lot of field in t_ds_workflow_instance.

And for these issue, the problem is the subworkflow instance miss scheduleTime? IMO, the schedule time should be mandatory, a workflow instance/task instance should have its own scheduleTimestartTimeendTime. The scheduleTime is necessary for dependent or business calculation. The subworkflow instance's scheduleTime should be the sub workflow logic task instance's scheduleTime

ruanwenjun avatar May 10 '25 04:05 ruanwenjun

Basically LGTM.

Right now, in the workflow instance metadata, we are missing the trigger context of the workflow instance, one workflow instance may be trigger multiple times, but we only store the latest once, it's better to add a new table t_ds_workflow_instance_action to store this information. The information should contain action type, action time, action user, action_context(include params). Then we can remove a lot of field in t_ds_workflow_instance.

And for these issue, the problem is the subworkflow instance miss scheduleTime? IMO, the schedule time should be mandatory, a workflow instance/task instance should have its own scheduleTimestartTimeendTime. The scheduleTime is necessary for dependent or business calculation. The subworkflow instance's scheduleTime should be the sub workflow logic task instance's scheduleTime

Yes, as you mentioned before. The only issue we need to address here is the performance problem when dependent task query the database.

If you think this new approach is acceptable, I'll submit a PR.

reele avatar May 17 '25 09:05 reele

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar Jun 17 '25 00:06 github-actions[bot]

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

github-actions[bot] avatar Jun 24 '25 00:06 github-actions[bot]

this is my approach:

  1. add entity WorkflowExecution or WorkflowTrigger or WorkflowTriggerAction or... to persist data of request

    • id
    • mainWorkflowDefinitionCode
    • trigger_type
    • trigger_time
  2. add execution_id to WorkflowInstance

  3. modify WorkflowManualTriggerRequest:

    • add workflowExecutionId field
  4. modify SubWorkflowLogicTask:

    • pass parent-workflow's workflowExecutionId to workflowManualTriggerRequest in triggerNewSubWorkflow
  5. modify AbstractWorkflowTrigger:

    1. add abstract method constructWorkflowExecution returns WorkflowExecution.id
      1. set submit_time to WorkflowExecution.trigger_time in WorkflowManualTrigger.constructWorkflowExecution
      2. set schedule_time to WorkflowExecution.trigger_time in [WorkflowScheduleTrigger/WorkflowBackfillTrigger].constructWorkflowExecution
      3. set WorkflowManualTriggerRequest.workflowExecutionId to WorkflowExecution.trigger_time in SubWorkflowManualTrigger.constructWorkflowExecution
    2. add parameter workflowExecutionId to constructWorkflowInstance, set workflowExecutionId into new workflowInstance
    3. call constructWorkflowExecution and save in triggerWorkflow before call constructWorkflowInstance, pass returned value workflowExecution.id to constructWorkflowInstance
  6. modify WorkflowInstanceMapper

    • rewrite queryLastRunningWorkflow/queryLastManualWorkflow/queryLastRunningWorkflow with sql segment: (select * from t_ds_workflow_instance where workflow_definition_code=... and workflow_execution_id in (select id from t_ds_workflow_execution where trigger_type in (...) and #{startTime} >= trigger_time and #{endTime} <= trigger_time))
    • modify ddl to optimize index of t_ds_workflow_instance
    • add dml to re-generate t_ds_workflow_execution for old workflow_instance
  7. modify DependentExecute:

    • add depend option[schedule/manual/all] maps trigger_type[backfill,schedule/manual]
    • rewrite findDependentWorkflowCandidate with mapper's new method
  8. modify DependentNode(optional)

    • add depend option[schedule/manual/all] to UI and DependentLogicTask

@ruanwenjun could you please advise on this approach?

reele avatar Aug 04 '25 09:08 reele