[Enhancement] Support multi-warehouse for BackendResourceStat and pipeline_dop
Why I'm doing:
BackendResourceStat
- Change the structure from
beId → beCPUCoresto a two-level mapping:warehouseId → <beId → beCPUCores>. - Protect all methods that update fields with a lock. Currently, only
getAvgNumCoresOfBeacquires the lock forcachedAvgNumCores, whilesetNumCoresOfBeandremoveBedo not, which introduces the following race condition: a.getAvgNumCoresOfBereadsnumCoresPerBeand computesavg. b. BeforegetAvgNumCoresOfBeupdatescachedAvgNumCores,setNumCoresOfBeadds a new BE tonumCoresPerBeand setscachedAvgNumCoresto-1. c.getAvgNumCoresOfBethen continues and writesavgback tocachedAvgNumCores, causing the newly added BE in step b to be excluded from the calculation.
pipline_dop
- Force
BackendResourceStat’sgetDefaultDOPandgetSinkDefaultDOPto takewarehouseIdas a required parameter.
Query Dump
Previously, Query Dump stored the BackendResourceStat information in be_core_stat, as shown below:
{
"be_core_stat":{
"cachedAvgNumOfCores":-1,
"numOfHardwareCoresPerBe":"{\"13041\":16,\"14044\":8,\"14045\":8}"
}
}
Now we add a new field be_core_stat_v2. Both be_core_stat and be_core_stat_v2 will be emitted to preserve backward compatibility.
{
"be_core_stat":{
"cachedAvgNumOfHardwareCores":-1,
"numOfHardwareCoresPerBe":"{\"13041\":16,\"14044\":8,\"14045\":8}"
},
"be_core_stat_v2":{
"cachedAvgNumOfHardwareCores":-1,
"warehouses":[
{"warehouseId":0,"cachedAvgNumOfHardwareCores":16,"numOfCoresPerBe":"{\"13041\":16}"},
{"warehouseId":14039,"cachedAvgNumOfHardwareCores":8,"numOfCoresPerBe":"{\"14044\":8,\"14045\":8}"}
],
"currentWarehouseId":14039
}
}
When loading a query dump, be_core_stat_v2 will be used preferentially; if it is absent, be_core_stat will be used instead.
What I'm doing:
Fixes #issue
What type of PR is this:
- [ ] BugFix
- [ ] Feature
- [x] Enhancement
- [ ] Refactor
- [ ] UT
- [ ] Doc
- [ ] Tool
Does this PR entail a change in behavior?
- [ ] Yes, this PR will result in a change in behavior.
- [x] No, this PR will not result in a change in behavior.
If yes, please specify the type of change:
- [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
- [ ] Parameter changes: default values, similar parameters but with different default values
- [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
- [ ] Feature removed
- [ ] Miscellaneous: upgrade & downgrade compatibility, etc.
Checklist:
- [x] I have added test cases for my bug fix or my new feature
- [ ] This pr needs user documentation (for new or modified features or behaviors)
- [ ] I have added documentation for my new feature or new function
- [ ] This is a backport pr
Bugfix cherry-pick branch check:
- [x] I have checked the version labels which the pr will be auto-backported to the target branch
- [x] 4.0
- [ ] 3.5
- [ ] 3.4
- [ ] 3.3
[!NOTE] Introduce per-warehouse BackendResourceStat and warehouse-scoped DOP, propagate API changes across planning/scheduling, and add be_core_stat_v2 to query dumps.
- System/Resource Stats:
- Replace global
BackendResourceStatwith warehouse-scoped stats (warehouseId -> {beId -> cores/mem}) and lock-protected mutations.- New APIs:
getAvgNumCoresOfBe(warehouseId),getDefaultDOP(warehouseId),getSinkDefaultDOP(warehouseId),getNumBes(warehouseId), per-warehouse setters/removal; add dump()/replay support.- Session/Variables & Planning:
SessionVariable.getDegreeOfParallelism(warehouseId)andgetSinkDegreeOfParallelism(warehouseId); callers updated (PlanFragment, PlanFragmentBuilder, HashJoinCostModel, FeatureExtractor, Delete/Insert/Update planners, LoadPlanner, StreamLoad/LoadLoadingTask).- Compute
pipeline_dopand thresholds using current warehouse.- Scheduling/Queueing:
PipelineDriverAllocatorcomputes DOP per warehouse default; batches requests by warehouse default DOP.QueryQueueOptionsderives metrics (BEs/cores/mem) via warehouse; simplify shared-data path.GlobalVariablewatermarks use new avg-core API.- Resource Groups:
- Creation/analysis use warehouse-aware BE counts/avg cores.
- Query Dump:
- Add
be_core_stat_v2with per-warehouse core stats and current warehouse; prefer on load, keep legacybe_core_stat.- Serializer now emits both; deserializer populates session warehouse name when available.
- Nodes/Cluster:
ComputeNode/SystemInfoServicepasswarehouseIdwhen updating/removing BE stats.- Tests:
- Update existing tests for new APIs; add UTs for allocator, query dump v2, warehouse queue options, and BackendResourceStat per-warehouse behavior.
Written by Cursor Bugbot for commit 11a65315fd967cb893681d99b28bb15fc8921456. This will update automatically on new commits. Configure here.
🧪 CI Insights
Here's what we observed from your CI run for 11a65315.
🟢 All jobs passed!
But CI Insights is watching 👀
@cursor review
Quality Gate passed
Issues
19 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
@cursor review
[Java-Extensions Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)
[FE Incremental Coverage Report]
:white_check_mark: pass : 285 / 294 (96.94%)
file detail
| path | covered_line | new_line | coverage | not_covered_line_detail | |
|---|---|---|---|---|---|
| :large_blue_circle: | com/starrocks/system/SystemInfoService.java | 5 | 8 | 62.50% | [1540, 1541, 1542] |
| :large_blue_circle: | com/starrocks/sql/plan/PlanFragmentBuilder.java | 9 | 10 | 90.00% | [2543] |
| :large_blue_circle: | com/starrocks/system/BackendResourceStat.java | 146 | 151 | 96.69% | [101, 102, 103, 104, 105] |
| :large_blue_circle: | com/starrocks/qe/SessionVariable.java | 4 | 4 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/qe/scheduler/slot/QueryQueueOptions.java | 8 | 8 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/DeletePlanner.java | 4 | 4 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/UpdatePlanner.java | 5 | 5 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/catalog/ResourceGroupMgr.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/qe/scheduler/slot/PipelineDriverAllocator.java | 18 | 18 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/dump/QueryDumpSerializer.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/planner/PlanFragment.java | 5 | 5 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/InsertPlanner.java | 5 | 5 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/dump/QueryDumpDeserializer.java | 22 | 22 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/qe/GlobalVariable.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/analyzer/ResourceGroupAnalyzer.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/cost/HashJoinCostModel.java | 8 | 8 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/system/ComputeNode.java | 4 | 4 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/LoadPlanner.java | 7 | 7 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/dump/QueryDumpInfo.java | 22 | 22 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/cost/feature/FeatureExtractor.java | 7 | 7 | 100.00% | [] |
[BE Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)
@Mergifyio backport branch-4.0
backport branch-4.0
✅ Backports have been created
- #66716 [Enhancement] Support multi-warehouse for BackendResourceStat and pipeline_dop (backport #66632) has been created for branch
branch-4.0but encountered conflicts