dolphinscheduler icon indicating copy to clipboard operation
dolphinscheduler copied to clipboard

[Improvement-17755][Dao] Optimize pagination query performance by addi…

Open fanzhenyu95 opened this issue 1 month ago • 2 comments

Purpose of the pull request

close #17755

Brief change log

Add the following indexes to improve the pagination query performance:

For t_ds_workflow_instance table:

  • idx_project_code_start_time (project_code, start_time DESC, id DESC)
  • idx_workflow_definition_code_start_time (workflow_definition_code, start_time DESC)

For t_ds_task_instance table:

  • idx_project_code_submit_time (project_code, submit_time DESC)
  • idx_project_code_start_time (project_code, start_time DESC)

Verify this pull request

This pull request is code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(or)

Pull Request Notice

Pull Request Notice

If your pull request contains incompatible change, you should also add it to docs/docs/en/guide/upgrade/incompatible.md

fanzhenyu95 avatar Nov 29 '25 07:11 fanzhenyu95

Thanks for opening this pull request! Please check out our contributing guidelines. (https://github.com/apache/dolphinscheduler/blob/dev/docs/docs/en/contribute/join/pull-request.md)

boring-cyborg[bot] avatar Nov 29 '25 07:11 boring-cyborg[bot]

Adding indexes alone won't solve all problems; we still need to optimize some query methods to achieve better results. Please first describe which interfaces you are currently experiencing slow query issues with.

ruanwenjun avatar Dec 11 '25 11:12 ruanwenjun

@SbloodyS @ruanwenjun Thank you for your feedback! The pagination queries for project list, workflow instance list, and task instance list are significantly slow in production environments with large datasets, leading to long page loading times and poor user experience.

Frequently used interfaces with slow query performance include:

  1. Project list API: GET /projects
  2. Workflow instance list API: GET /projects/{projectCode}/workflow-instances
  3. Task instance list API: GET /projects/{projectCode}/task-instances

While adding indexes alone may not solve all performance issues, in our specific scenario with tables containing millions of records, list queries were taking over 10 seconds to execute and sometimes even failed to return results. After implementing the proposed indexes, all list queries now perform normally with acceptable response times.

fanzhenyu95 avatar Dec 15 '25 07:12 fanzhenyu95

@fanzhenyu95 Thanks for the clarification. I've updated the issue and PR titles.

ruanwenjun avatar Dec 15 '25 07:12 ruanwenjun