[Enhancement] Remove TabletinvertedIndex lock to improve metadata performance
Why I'm doing:
What I'm doing:
This pull request introduces several improvements and refactorings to how tablet and replica metadata are managed and accessed, as well as some robustness enhancements around tablet state handling. The main themes are API simplification for tablet/replica lookups, improved error handling when tablet metadata is missing, and the addition of a new concurrent map utility.
Tablet and Replica Metadata API Simplification:
- Refactored
TabletInvertedIndexusage across the codebase to replaceMap<Long, Replica>return types with more direct methods such asgetReplicasByTabletId,getTabletIdsByBackendId, andgetReplica, simplifying how replicas and tablets are accessed and iterated. This reduces unnecessary map wrapping and improves clarity. [1] [2] [3] [4] [5]
Robustness and Error Handling:
- Added checks and logging for cases where tablet metadata is missing during replica creation and binlog scan range building, ensuring that such cases are handled gracefully and do not cause failures or inconsistencies. [1] [2]
Concurrency Utility Addition:
- Introduced a new utility class
ConcurrentLong2ObjectHashMapthat provides a segmented, thread-safe map for long keys, improving performance and concurrency for metadata storage. This is backed byLong2ObjectOpenHashMapto avoid auto-boxing and supports efficient reads, writes, and resizing.
Logging Improvements:
- Added a logger to
BinlogConsumeStateVOand improved logging for missing tablet scenarios, aiding debugging and operational visibility. [1] [2]
Code Cleanup:
- Removed unused imports from files that previously referenced the old tablet/replica APIs. [1] [2] [3]
What type of PR is this:
- [ ] BugFix
- [ ] Feature
- [x] Enhancement
- [ ] Refactor
- [ ] UT
- [ ] Doc
- [ ] Tool
Does this PR entail a change in behavior?
- [ ] Yes, this PR will result in a change in behavior.
- [x] No, this PR will not result in a change in behavior.
If yes, please specify the type of change:
- [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
- [ ] Parameter changes: default values, similar parameters but with different default values
- [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
- [ ] Feature removed
- [ ] Miscellaneous: upgrade & downgrade compatibility, etc.
Checklist:
- [ ] I have added test cases for my bug fix or my new feature
- [ ] This pr needs user documentation (for new or modified features or behaviors)
- [ ] I have added documentation for my new feature or new function
- [ ] This is a backport pr
Bugfix cherry-pick branch check:
- [x] I have checked the version labels which the pr will be auto-backported to the target branch
- [ ] 4.0
- [ ] 3.5
- [ ] 3.4
- [ ] 3.3
[!NOTE] Replaces lock-based TabletInvertedIndex with concurrent data structures and new APIs, updates callers accordingly, adds a concurrent long-key map, and improves missing-metadata handling/logging.
- Metadata/Concurrency:
- TabletInvertedIndex: Replaced global RW lock and map-of-maps with concurrent structures (
ConcurrentLong2ObjectHashMap,CopyOnWriteArrayList), added mutation lock only for writes. Introduced APIs:getReplicasByTabletId,getReplica,getTabletIdsByBackendId, path-hash grouping, and replica counts; removed replica/id cross maps and read/write lock methods.- Utility: Added
ConcurrentLong2ObjectHashMap(segmented, thread-safe long→object map) to back metadata indexes.- Callers updated to new APIs:
ReportHandler: Reworked tablet diff to iterategetTabletIdsByBackendId, usegetReplica, handle missingTabletMeta, and retain migration/txn sync logic; minor logging tweaks.SystemHandler: UsesgetReplicasByTabletIdin decommission-drop checks.TabletScheduler: ValidateTabletMetabefore logging add-replica edit.OlapDeleteJob: BuildTabletCommitInfodirectly fromTabletDeleteInfo(no inverted-index reverse lookup).BinlogConsumeStateVO: Added logger and explicit error when tablet meta is missing.- Tests: Updated to new list-based replica APIs and path-hash grouping; adjusted FE test utilities to new replica map type.
Written by Cursor Bugbot for commit 61b5112b9d594537a98f9137cd7eb8b1d4370e67. This will update automatically on new commits. Configure here.
@cursor review
🧪 CI Insights
Here's what we observed from your CI run for 61b5112b.
🟢 All jobs passed!
But CI Insights is watching 👀
[FE Incremental Coverage Report]
:x: fail : 107 / 180 (59.44%)
file detail
| path | covered_line | new_line | coverage | not_covered_line_detail | |
|---|---|---|---|---|---|
| :large_blue_circle: | com/starrocks/scheduler/mv/BinlogConsumeStateVO.java | 0 | 4 | 00.00% | [51, 52, 53, 54] |
| :large_blue_circle: | com/starrocks/catalog/TabletInvertedIndex.java | 40 | 81 | 49.38% | [171, 195, 198, 199, 200, 201, 217, 219, 223, 238, 239, 241, 242, 244, 245, 256, 260, 262, 263, 264, 266, 267, 268, 269, 270, 271, 272, 275, 285, 309, 340, 344, 345, 346, 353, 354, 355, 368, 369, 370, 372] |
| :large_blue_circle: | com/starrocks/clone/TabletScheduler.java | 2 | 4 | 50.00% | [1907, 1910] |
| :large_blue_circle: | com/starrocks/leader/ReportHandler.java | 65 | 91 | 71.43% | [581, 587, 593, 598, 630, 644, 646, 649, 650, 651, 652, 654, 655, 656, 657, 658, 661, 662, 663, 673, 695, 709, 710, 711, 714, 715] |
Quality Gate failed
Failed conditions
B Reliability Rating on New Code (required ≥ A)
See analysis details on SonarQube Cloud
Catch issues before they fail your Quality Gate with our IDE extension
SonarQube for IDE
[Java-Extensions Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)
[BE Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)
@cursor review