[Feature] Implement tablet reshard job in FE for tablet splitting
Why I'm doing:
This PR implements the tablet reshard job functionality in StarRocks FE (Frontend) for tablet splitting and merging operations in shared-data mode.
What I'm doing:
Overview
This commit introduces the SplitTabletJob class and refactors the tablet resharding infrastructure to support tablet splitting and merging in StarRocks' shared-data (lake) mode. The implementation follows a state machine pattern with clear state transitions: PENDING → PREPARING → RUNNING → CLEANING → FINISHED.
Key Changes
New Classes:
- SplitTabletJob: Core job class implementing the tablet split workflow with 6 states (PENDING, PREPARING, RUNNING, CLEANING, FINISHED, ABORTING/ABORTED)
- ReshardingPhysicalPartition: Tracks resharding context for a physical partition
- ReshardingMaterializedIndex: Tracks resharding context for a materialized index
- TabletRange: Represents the key range of a tablet after splitting
Refactored Classes:
- TabletReshardJob: Converted to abstract base class defining the job lifecycle
- SplitTabletJobFactory: Updated to create ReshardingPhysicalPartition and ReshardingMaterializedIndex structures
- PublishTabletsInfo: Simplified to use List<ReshardingTabletInfoPB> instead of separate lists
- ReshardingTablet interface: Added getOldTabletIds(), getNewTabletIds(), and toProto() methods
Proto Changes:
- Replaced ReshardingTabletsInfoPB with ReshardingTabletInfoPB (union-style message)
- Added tablet_ranges field in PublishVersionResponse for returning new tablet ranges
- Removed find_split_point RPC (functionality moved elsewhere)
Deleted Classes:
- PhysicalPartitionContext: Replaced by ReshardingPhysicalPartition
- ReshardingTablets: Logic distributed to new classes
- ReshardingTabletContext: Renamed to ReshardingTabletInfo
Workflow
- PENDING → PREPARING: Set table state to TABLET_RESHARD, allocate transaction ID, update partition versions, add new tablets to inverted index, register resharding tablets
- PREPARING → RUNNING: Wait for previous versions to be published
- RUNNING → CLEANING: Publish split transaction to CN, update tablet ranges, add new materialized indexes to catalog
- CLEANING → FINISHED: Wait for in-flight transactions to complete, remove old materialized indexes, restore table state to NORMAL
Testing
Added SplitTabletJobTest covering:
- Normal job execution flow
- Job replay for crash recovery
- Job abort scenarios
Fixes #64986
What type of PR is this:
- [ ] BugFix
- [x] Feature
- [ ] Enhancement
- [ ] Refactor
- [ ] UT
- [ ] Doc
- [ ] Tool
Does this PR entail a change in behavior?
- [ ] Yes, this PR will result in a change in behavior.
- [x] No, this PR will not result in a change in behavior.
If yes, please specify the type of change:
- [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
- [ ] Parameter changes: default values, similar parameters but with different default values
- [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
- [ ] Feature removed
- [ ] Miscellaneous: upgrade & downgrade compatibility, etc.
Checklist:
- [x] I have added test cases for my bug fix or my new feature
- [ ] This pr needs user documentation (for new or modified features or behaviors)
- [ ] I have added documentation for my new feature or new function
- [ ] This is a backport pr
Bugfix cherry-pick branch check:
- [x] I have checked the version labels which the pr will be auto-backported to the target branch
- [ ] 4.0
- [ ] 3.5
- [ ] 3.4
- [ ] 3.3
[!NOTE] Implements the FE tablet-splitting reshard job with a refactored resharding model, updated lake-service protos, new config/property semantics, and publish-version support for returning new tablet ranges.
- FE/Resharding (Shared-data):
- Introduces
SplitTabletJobwith full lifecycle (PENDING→FINISHED), registering/unregistering resharding tablets and updating ranges after publish.- Adds
ReshardingPhysicalPartitionandReshardingMaterializedIndex; refactorsTabletReshardJobto abstract base; renames context toReshardingTabletInfo.- Simplifies
PublishTabletsInfoto useList<ReshardingTabletInfoPB>; updatesReshardingTabletto exposegetOldTabletIds(),getNewTabletIds(), andtoProto().- Updates
SplittingTablet/MergingTablet/IdenticalTabletto emit new proto payloads.- Adds
TabletRangeplusTuple/Variantconverters.TabletReshardJobMgrnow tracksReshardingTabletInfo.- Proto/BE integration:
- Replaces
ReshardingTabletsInfoPBwith union-styleReshardingTabletInfoPB.PublishVersionResponseincludestablet_ranges; requests carryresharding_tablet_infos.- Removes
find_split_pointRPC; plumbs new fields throughUtilsaggregate/single publish flows.- SQL/Config:
- Replaces
tablet_reshard_split_sizewithtablet_reshard_target_size; raises max split count; parser/analyzer andSplitTabletClauseadjusted.- Adds lock helper ctor
AutoCloseableLock(dbId, tableId, ...).- Tests:
- Adds/updates UTs for publish info, split job lifecycle/replay/abort, job mgr, parser for new property.
- Removals:
- Deletes
PhysicalPartitionContextandReshardingTablets; renamesReshardingTabletContexttoReshardingTabletInfo.Written by Cursor Bugbot for commit 8ecf3301d09d9e50f2735d88c188ce008ce2b46d. This will update automatically on new commits. Configure here.
🧪 CI Insights
Here's what we observed from your CI run for 8ecf3301.
🟢 All jobs passed!
But CI Insights is watching 👀
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
Quality Gate passed
Issues
40 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
1.1% Duplication on New Code
@cursor review
[Java-Extensions Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)
[FE Incremental Coverage Report]
:white_check_mark: pass : 432 / 540 (80.00%)
file detail
| path | covered_line | new_line | coverage | not_covered_line_detail | |
|---|---|---|---|---|---|
| :large_blue_circle: | com/starrocks/catalog/Tuple.java | 0 | 2 | 00.00% | [60, 64] |
| :large_blue_circle: | com/starrocks/alter/reshard/TabletReshardException.java | 0 | 4 | 00.00% | [23, 24, 27, 28] |
| :large_blue_circle: | com/starrocks/alter/reshard/TabletReshardUtils.java | 0 | 6 | 00.00% | [22, 23, 27, 28, 29, 31] |
| :large_blue_circle: | com/starrocks/sql/analyzer/AlterTableClauseAnalyzer.java | 0 | 2 | 00.00% | [1271, 1272] |
| :large_blue_circle: | com/starrocks/catalog/Variant.java | 0 | 2 | 00.00% | [81, 85] |
| :large_blue_circle: | com/starrocks/common/util/PropertyAnalyzer.java | 0 | 5 | 00.00% | [1615, 1617, 1618, 1621, 1628] |
| :large_blue_circle: | com/starrocks/alter/reshard/MergingTablet.java | 5 | 8 | 62.50% | [61, 65, 69] |
| :large_blue_circle: | com/starrocks/catalog/TabletRange.java | 9 | 14 | 64.29% | [27, 28, 29, 40, 41] |
| :large_blue_circle: | com/starrocks/sql/ast/SplitTabletClause.java | 2 | 3 | 66.67% | [35] |
| :large_blue_circle: | com/starrocks/alter/reshard/SplitTabletJobFactory.java | 46 | 64 | 71.88% | [124, 125, 131, 176, 177, 179, 180, 181, 183, 186, 192, 199, 200, 201, 202, 206, 210, 211] |
| :large_blue_circle: | com/starrocks/lake/Utils.java | 9 | 12 | 75.00% | [181, 182, 183] |
| :large_blue_circle: | com/starrocks/alter/reshard/IdenticalTablet.java | 7 | 9 | 77.78% | [60, 69] |
| :large_blue_circle: | com/starrocks/alter/reshard/ReshardingPhysicalPartition.java | 28 | 36 | 77.78% | [86, 92, 93, 94, 95, 96, 97, 98] |
| :large_blue_circle: | com/starrocks/alter/reshard/TabletReshardJob.java | 10 | 12 | 83.33% | [121, 199] |
| :large_blue_circle: | com/starrocks/alter/reshard/SplitTabletJob.java | 263 | 307 | 85.67% | [84, 88, 92, 96, 153, 160, 161, 163, 166, 192, 216, 237, 238, 271, 273, 274, 308, 309, 381, 382, 385, 386, 398, 403, 406, 409, 423, 424, 432, 433, 445, 459, 460, 473, 502, 503, 504, 515, 532, 547, 554, 588, 589, 590] |
| :large_blue_circle: | com/starrocks/alter/reshard/SplittingTablet.java | 7 | 8 | 87.50% | [55] |
| :large_blue_circle: | com/starrocks/alter/reshard/ReshardingTabletInfo.java | 6 | 6 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/common/Config.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/alter/reshard/PublishTabletsInfo.java | 9 | 9 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/alter/reshard/ReshardingMaterializedIndex.java | 13 | 13 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/common/util/concurrent/lock/AutoCloseableLock.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/alter/reshard/TabletReshardJobMgr.java | 8 | 8 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/persist/gson/GsonUtils.java | 6 | 6 | 100.00% | [] |
[BE Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)