[Enhancement] Optimize and refine the tablet health check logic
Why I'm doing:
There is an issue with the current tablet health check logic. For example, a tablet may have three replicas located on different BEs, but all of these BEs share the same label, such as rack:rack1. In this case, the system incorrectly marks the tablet as being in a LOCATION_MISMATCH state.
This judgment is unreasonable because it affects operations that rely on the tablet's health status, such as tablet scheduling and index creation. If all nodes in the cluster have the same label, it is effectively equivalent to having no label at all. However, StarRocks behaves inconsistently between these two cases.
Moreover, the current design explicitly allows multiple replicas of the same tablet to reside within the same rack. Therefore, it is not appropriate to consider a tablet unhealthy just because its replicas share the same label.
What I'm doing:
Remove the redundant validation logic.
Fixes #issue
What type of PR is this:
- [x] BugFix
- [ ] Feature
- [ ] Enhancement
- [ ] Refactor
- [ ] UT
- [ ] Doc
- [ ] Tool
Does this PR entail a change in behavior?
- [ ] Yes, this PR will result in a change in behavior.
- [x] No, this PR will not result in a change in behavior.
If yes, please specify the type of change:
- [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
- [ ] Parameter changes: default values, similar parameters but with different default values
- [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
- [ ] Feature removed
- [ ] Miscellaneous: upgrade & downgrade compatibility, etc.
Checklist:
- [ ] I have added test cases for my bug fix or my new feature
- [ ] This pr needs user documentation (for new or modified features or behaviors)
- [ ] I have added documentation for my new feature or new function
- [ ] This is a backport pr
Bugfix cherry-pick branch check:
- [x] I have checked the version labels which the pr will be auto-backported to the target branch
- [ ] 3.5
- [ ] 3.4
- [ ] 3.3
Why not assign different racks to BEs? If all BEs are on the same rack, there's no need to use this feature. The original intent of this design was to distribute different replicas across different racks as much as possible. Due to limitations in the scheduling framework, tablets that don't meet the conditions can only be marked as unhealthy.
Why not assign different racks to BEs? If all BEs are on the same rack, there's no need to use this feature. The original intent of this design was to distribute different replicas across different racks as much as possible. Due to limitations in the scheduling framework, tablets that don't meet the conditions can only be marked as unhealthy.
Hi @gengjun-git I've been describing a specific scenario. For example, suppose you have multiple businesses: business1, business2, and business3. You allocate different sets of machines to each business using labels — for instance, rack:rack1 is assigned to business1, and rack:rack2 to business2. This is a typical and valid use case.
Now, say rack:rack1 has 12 BE nodes, and the tables under business1 have multiple replicas — which is also normal, since the table creation process allows replica_num to be greater than count(distinct rack).
However, in this scenario, the tablets of business1's tables are marked as unhealthy, which seems problematic. To put it simply:
Either we should not allow replica_num > count(distinct rack) during table creation, though this raises further questions and design considerations;
Or we should not treat this situation as unhealthy.
Hello @hongkunxu, I understand your requirement; it seems similar to a warehouse need, but the community edition doesn’t have this feature yet. For your PR, you could add a toggle: whether replicas must be placed on different racks, default is true.
Hello @hongkunxu, I understand your requirement; it seems similar to a warehouse need, but the community edition doesn’t have this feature yet. For your PR, you could add a toggle: whether replicas must be placed on different racks, default is true.
Hi @gengjun-git Thank you for your feedback and suggestions. I'll revise this PR accordingly.
I think only when the number of racks assigned to the table when it is created is greater than or equal to the replication number, the health of the tablet is determined strictly by putting replicas on different racks. Otherwise, the cluster will do its best to place the replicas on the limited racks, and the health of the tablet will not be effected by replica placement on different racks.
- Multiple racks are assigned when creating a table, this is a requirement for higher availability.
- Only one rack is assigned when creating a table and different table uses different rack, this is a requirement for physical isolation.
I think, in the two cases, the health of tablet should be different, and the system should automatically adapt them.
- Multiple racks are assigned when creating a table, this is a requirement for higher availability.
- Only one rack is assigned when creating a table and different table uses different rack, this is a requirement for physical isolation.
I think, in the two cases, the health of tablet should be different, and the system should automatically adapt them.
This is the best solution. @hongkunxu
- Multiple racks are assigned when creating a table, this is a requirement for higher availability.
- Only one rack is assigned when creating a table and different table uses different rack, this is a requirement for physical isolation.
I think, in the two cases, the health of tablet should be different, and the system should automatically adapt them.
This is the best solution. @hongkunxu
Hi @wyb @gengjun-git Thank you both for your expert advice. I also had a discussion with @wyb , and I will redesign the logic of the tablet check algorithm.
Hi @wyb @gengjun-git , I have resubmit a pr according your advice, please help to review it again when you are free.
Quality Gate passed
Issues
1 New issue
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
[Java-Extensions Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)
[FE Incremental Coverage Report]
:white_check_mark: pass : 26 / 27 (96.30%)
file detail
| path | covered_line | new_line | coverage | not_covered_line_detail | |
|---|---|---|---|---|---|
| :large_blue_circle: | com/starrocks/clone/TabletChecker.java | 26 | 27 | 96.30% | [910] |
[BE Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)
https://github.com/Mergifyio backport branch-3.5
backport branch-3.5
✅ Backports have been created
- #63195 [Enhancement] Optimize and refine the tablet health check logic (backport #59824) has been created for branch
branch-3.5