starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Enhancement] support dump pipeline status when query timeout (backport #66540)

Open mergify[bot] opened this issue 2 weeks ago • 2 comments

Why I'm doing:

What I'm doing:

Fixes #issue

What type of PR is this:

  • [ ] BugFix
  • [ ] Feature
  • [ ] Enhancement
  • [ ] Refactor
  • [x] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [ ] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [x] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [x] 4.0
    • [x] 3.5
    • [ ] 3.4
    • [ ] 3.3

[!NOTE] Adds a flag to enable detailed driver diagnostics on timeouts/finished cancels and updates FE to cancel with TIMEOUT reason on timeout errors.

  • Execution diagnostics (BE):
    • Add config::pipeline_timeout_diagnostic flag to toggle detailed timeout diagnostics.
    • fragment_context.cpp: when finalizing and drivers are blocked, conditionally log driver details for finished-cancel states (QueryFinished/LimitReach) when enabled; minor refactor extracting detailed_message and is_finished_cancel.
    • timeout_tasks.cpp: conditionally log blocked driver details on query timeout when enabled.
  • Coordinator behavior (FE):
    • On TIMEOUT error code, invoke cancelInternal(PPlanFragmentCancelReason.TIMEOUT) instead of INTERNAL_ERROR.

Written by Cursor Bugbot for commit c2841597474c7c8d233403c863009d043919feab. This will update automatically on new commits. Configure here.


This is an automatic backport of pull request #66540 done by [Mergify](https://mergify.com).

[!NOTE] Adds a BE flag to conditionally log detailed driver info on pipeline timeouts and updates FE to cancel with TIMEOUT on timeout errors.

  • Backend (BE)
    • Config: Add config::pipeline_timeout_diagnostic flag in be/src/common/config.h (default false).
    • Diagnostics on timeout:
      • exec/pipeline/fragment_context.cpp: During set_final_status, extract detailed_message, detect timeouts, and when drivers are blocked, conditionally LOG(WARNING) driver details if timeout and flag enabled; minor refactor of cancel log levels.
      • exec/pipeline/schedule/timeout_tasks.cpp: On fragment timeout, conditionally log blocked driver details with LOG_IF(..., config::pipeline_timeout_diagnostic).
  • Frontend (FE)
    • DefaultCoordinator.handleErrorExecution(...): For TIMEOUT error code, call cancelInternal(PPlanFragmentCancelReason.TIMEOUT) instead of INTERNAL_ERROR.

Written by Cursor Bugbot for commit 6334f2de5d56ec74df15d6499d3831d8f280c5f2. This will update automatically on new commits. Configure here.

mergify[bot] avatar Dec 11 '25 07:12 mergify[bot]

🧪 CI Insights

Here's what we observed from your CI run for 6334f2de.

🟢 All jobs passed!

But CI Insights is watching 👀

mergify[bot] avatar Dec 11 '25 09:12 mergify[bot]

@cursor review

alvin-celerdata avatar Dec 11 '25 15:12 alvin-celerdata