starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Enhancement] Support multi-warehouse for BackendResourceStat and pipeline_dop

Open ZiheLiu opened this issue 2 weeks ago • 7 comments

Why I'm doing:

BackendResourceStat

  1. Change the structure from beId → beCPUCores to a two-level mapping: warehouseId → <beId → beCPUCores>.
  2. Protect all methods that update fields with a lock. Currently, only getAvgNumCoresOfBe acquires the lock for cachedAvgNumCores, while setNumCoresOfBe and removeBe do not, which introduces the following race condition: a. getAvgNumCoresOfBe reads numCoresPerBe and computes avg. b. Before getAvgNumCoresOfBe updates cachedAvgNumCores, setNumCoresOfBe adds a new BE to numCoresPerBe and sets cachedAvgNumCores to -1. c. getAvgNumCoresOfBe then continues and writes avg back to cachedAvgNumCores, causing the newly added BE in step b to be excluded from the calculation.

pipline_dop

  • Force BackendResourceStat’s getDefaultDOP and getSinkDefaultDOP to take warehouseId as a required parameter.

Query Dump

Previously, Query Dump stored the BackendResourceStat information in be_core_stat, as shown below:

{
    "be_core_stat":{
        "cachedAvgNumOfCores":-1,
        "numOfHardwareCoresPerBe":"{\"13041\":16,\"14044\":8,\"14045\":8}"
    }
}

Now we add a new field be_core_stat_v2. Both be_core_stat and be_core_stat_v2 will be emitted to preserve backward compatibility.

{
    "be_core_stat":{
        "cachedAvgNumOfHardwareCores":-1,
        "numOfHardwareCoresPerBe":"{\"13041\":16,\"14044\":8,\"14045\":8}"
    },
    "be_core_stat_v2":{
        "cachedAvgNumOfHardwareCores":-1,
        "warehouses":[
            {"warehouseId":0,"cachedAvgNumOfHardwareCores":16,"numOfCoresPerBe":"{\"13041\":16}"},
            {"warehouseId":14039,"cachedAvgNumOfHardwareCores":8,"numOfCoresPerBe":"{\"14044\":8,\"14045\":8}"}
        ],
        "currentWarehouseId":14039
    }
}

When loading a query dump, be_core_stat_v2 will be used preferentially; if it is absent, be_core_stat will be used instead.

What I'm doing:

Fixes #issue

What type of PR is this:

  • [ ] BugFix
  • [ ] Feature
  • [x] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [x] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [x] 4.0
    • [ ] 3.5
    • [ ] 3.4
    • [ ] 3.3

[!NOTE] Introduce per-warehouse BackendResourceStat and warehouse-scoped DOP, propagate API changes across planning/scheduling, and add be_core_stat_v2 to query dumps.

  • System/Resource Stats:
    • Replace global BackendResourceStat with warehouse-scoped stats (warehouseId -> {beId -> cores/mem}) and lock-protected mutations.
    • New APIs: getAvgNumCoresOfBe(warehouseId), getDefaultDOP(warehouseId), getSinkDefaultDOP(warehouseId), getNumBes(warehouseId), per-warehouse setters/removal; add dump()/replay support.
  • Session/Variables & Planning:
    • SessionVariable.getDegreeOfParallelism(warehouseId) and getSinkDegreeOfParallelism(warehouseId); callers updated (PlanFragment, PlanFragmentBuilder, HashJoinCostModel, FeatureExtractor, Delete/Insert/Update planners, LoadPlanner, StreamLoad/LoadLoadingTask).
    • Compute pipeline_dop and thresholds using current warehouse.
  • Scheduling/Queueing:
    • PipelineDriverAllocator computes DOP per warehouse default; batches requests by warehouse default DOP.
    • QueryQueueOptions derives metrics (BEs/cores/mem) via warehouse; simplify shared-data path.
    • GlobalVariable watermarks use new avg-core API.
  • Resource Groups:
    • Creation/analysis use warehouse-aware BE counts/avg cores.
  • Query Dump:
    • Add be_core_stat_v2 with per-warehouse core stats and current warehouse; prefer on load, keep legacy be_core_stat.
    • Serializer now emits both; deserializer populates session warehouse name when available.
  • Nodes/Cluster:
    • ComputeNode/SystemInfoService pass warehouseId when updating/removing BE stats.
  • Tests:
    • Update existing tests for new APIs; add UTs for allocator, query dump v2, warehouse queue options, and BackendResourceStat per-warehouse behavior.

Written by Cursor Bugbot for commit 11a65315fd967cb893681d99b28bb15fc8921456. This will update automatically on new commits. Configure here.

ZiheLiu avatar Dec 11 '25 09:12 ZiheLiu

🧪 CI Insights

Here's what we observed from your CI run for 11a65315.

🟢 All jobs passed!

But CI Insights is watching 👀

mergify[bot] avatar Dec 11 '25 09:12 mergify[bot]

@cursor review

alvin-celerdata avatar Dec 11 '25 15:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 12 '25 04:12 alvin-celerdata

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Dec 12 '25 09:12 github-actions[bot]

[FE Incremental Coverage Report]

:white_check_mark: pass : 285 / 294 (96.94%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: com/starrocks/system/SystemInfoService.java 5 8 62.50% [1540, 1541, 1542]
:large_blue_circle: com/starrocks/sql/plan/PlanFragmentBuilder.java 9 10 90.00% [2543]
:large_blue_circle: com/starrocks/system/BackendResourceStat.java 146 151 96.69% [101, 102, 103, 104, 105]
:large_blue_circle: com/starrocks/qe/SessionVariable.java 4 4 100.00% []
:large_blue_circle: com/starrocks/qe/scheduler/slot/QueryQueueOptions.java 8 8 100.00% []
:large_blue_circle: com/starrocks/sql/DeletePlanner.java 4 4 100.00% []
:large_blue_circle: com/starrocks/sql/UpdatePlanner.java 5 5 100.00% []
:large_blue_circle: com/starrocks/catalog/ResourceGroupMgr.java 2 2 100.00% []
:large_blue_circle: com/starrocks/qe/scheduler/slot/PipelineDriverAllocator.java 18 18 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/dump/QueryDumpSerializer.java 1 1 100.00% []
:large_blue_circle: com/starrocks/planner/PlanFragment.java 5 5 100.00% []
:large_blue_circle: com/starrocks/sql/InsertPlanner.java 5 5 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/dump/QueryDumpDeserializer.java 22 22 100.00% []
:large_blue_circle: com/starrocks/qe/GlobalVariable.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/analyzer/ResourceGroupAnalyzer.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/cost/HashJoinCostModel.java 8 8 100.00% []
:large_blue_circle: com/starrocks/system/ComputeNode.java 4 4 100.00% []
:large_blue_circle: com/starrocks/sql/LoadPlanner.java 7 7 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/dump/QueryDumpInfo.java 22 22 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/cost/feature/FeatureExtractor.java 7 7 100.00% []

github-actions[bot] avatar Dec 12 '25 09:12 github-actions[bot]

[BE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Dec 12 '25 09:12 github-actions[bot]

@Mergifyio backport branch-4.0

github-actions[bot] avatar Dec 15 '25 03:12 github-actions[bot]

backport branch-4.0

✅ Backports have been created

mergify[bot] avatar Dec 15 '25 03:12 mergify[bot]