starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Feature] Implement publish version in BE for tablet splitting

Open xiangguangyxg opened this issue 2 weeks ago • 6 comments

Why I'm doing:

This PR implements the BE-side publish version logic for tablet splitting/resharding feature.

What I'm doing:

Key changes:

  • Add tablet_reshard.cpp/h: Core logic for handling tablet splitting, merging, and identical tablet scenarios during publish version
  • Modify transactions.cpp: Extend publish_version() to support PublishTabletInfo which carries cross-publish information
  • Modify lake_service.cpp: Handle resharding tablet info in publish_version RPC, support both tablet reshard transactions and cross-publish for normal transactions
  • Add txn_log_applier.cpp: Add tablet ID validation for cross-publish scenarios

Main features:

  • Support TXN_TABLET_RESHARD transaction type for tablet resharding
  • Support cross-publish: apply txn logs from source tablets to target tablets
  • Handle three resharding scenarios: splitting (1->N), merging (N->1), and identical (1->1)
  • Properly set shared flags for data files when tablets share the same underlying data
  • Cache optimization to avoid redundant metadata operations on retry

Fixes #64986

What type of PR is this:

  • [ ] BugFix
  • [x] Feature
  • [ ] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [x] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [ ] 4.0
    • [ ] 3.5
    • [ ] 3.4
    • [ ] 3.3

[!NOTE] Implements BE/FE support for tablet resharding (split/merge/identical) with cross-publish, new RPC/proto fields, and a new FE SplitTablet job pipeline, plus config/property updates and extensive tests.

  • Backend (Lake/Service):
    • Add storage/lake/tablet_reshard.{h,cpp}: core resharding logic, txn-log conversion, shared-file handling, metadata/range outputs.
    • Extend transactions.{h,cpp} publish_version to accept PublishTabletInfo and apply cross-publish; validate tablet IDs in txn_log_applier.cpp.
    • Overhaul LakeServiceImpl::publish_version to process resharding_tablet_infos, batch merge cases, handle reshard-txn vs cross-publish, and populate tablet_ranges/tablet_metas in responses.
    • Update metrics/tracing and task batching; add helpers for failed tablets and logging.
    • CMake includes new source; numerous BE tests added.
  • RPC/Proto:
    • Replace ReshardingTabletsInfoPB with repeated ReshardingTabletInfoPB; add PublishVersionResponse.tablet_ranges.
    • Remove distribution_columns and find_split_point RPC; wire new fields through aggregate publish helpers.
  • Frontend (FE):
    • Introduce SplitTabletJob pipeline and supporting types (ReshardingPhysicalPartition, ReshardingMaterializedIndex, updated ReshardingTablet API), plus serialization adapters.
    • Simplify publish wiring: FE constructs per-tablet ReshardingTabletInfoPB and processes returned TabletRange.
    • Replace deprecated PhysicalPartitionContext/ReshardingTablets; refactor PublishTabletsInfo.
    • Add TabletRange, tuple/variant (thrift/proto) converters.
    • Config/property: rename tablet_reshard_split_size -> tablet_reshard_target_size; raise tablet_reshard_max_split_count.
  • Tests:
    • Add/expand BE and FE tests for splitting/merging/identical flows, RPC handling, and job manager behavior.

Written by Cursor Bugbot for commit 8df8e5b7668a27c16004de10977df93fb61a7776. This will update automatically on new commits. Configure here.

xiangguangyxg avatar Dec 12 '25 07:12 xiangguangyxg

🧪 CI Insights

Here's what we observed from your CI run for bb2e8d3c.

🟢 All jobs passed!

But CI Insights is watching 👀

mergify[bot] avatar Dec 12 '25 07:12 mergify[bot]

@cursor review

alvin-celerdata avatar Dec 12 '25 07:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 12 '25 17:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 13 '25 04:12 alvin-celerdata

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Dec 13 '25 08:12 github-actions[bot]

[FE Incremental Coverage Report]

:white_check_mark: pass : 432 / 540 (80.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: com/starrocks/catalog/Tuple.java 0 2 00.00% [60, 64]
:large_blue_circle: com/starrocks/alter/reshard/TabletReshardException.java 0 4 00.00% [23, 24, 27, 28]
:large_blue_circle: com/starrocks/alter/reshard/TabletReshardUtils.java 0 6 00.00% [22, 23, 27, 28, 29, 31]
:large_blue_circle: com/starrocks/sql/analyzer/AlterTableClauseAnalyzer.java 0 2 00.00% [1271, 1272]
:large_blue_circle: com/starrocks/catalog/Variant.java 0 2 00.00% [81, 85]
:large_blue_circle: com/starrocks/common/util/PropertyAnalyzer.java 0 5 00.00% [1615, 1617, 1618, 1621, 1628]
:large_blue_circle: com/starrocks/alter/reshard/MergingTablet.java 5 8 62.50% [61, 65, 69]
:large_blue_circle: com/starrocks/catalog/TabletRange.java 9 14 64.29% [27, 28, 29, 40, 41]
:large_blue_circle: com/starrocks/sql/ast/SplitTabletClause.java 2 3 66.67% [35]
:large_blue_circle: com/starrocks/alter/reshard/SplitTabletJobFactory.java 46 64 71.88% [124, 125, 131, 176, 177, 179, 180, 181, 183, 186, 192, 199, 200, 201, 202, 206, 210, 211]
:large_blue_circle: com/starrocks/lake/Utils.java 9 12 75.00% [181, 182, 183]
:large_blue_circle: com/starrocks/alter/reshard/IdenticalTablet.java 7 9 77.78% [60, 69]
:large_blue_circle: com/starrocks/alter/reshard/ReshardingPhysicalPartition.java 28 36 77.78% [86, 92, 93, 94, 95, 96, 97, 98]
:large_blue_circle: com/starrocks/alter/reshard/TabletReshardJob.java 10 12 83.33% [121, 199]
:large_blue_circle: com/starrocks/alter/reshard/SplitTabletJob.java 263 307 85.67% [84, 88, 92, 96, 153, 160, 161, 163, 166, 192, 216, 237, 238, 271, 273, 274, 308, 309, 381, 382, 385, 386, 398, 403, 406, 409, 423, 424, 432, 433, 445, 459, 460, 473, 502, 503, 504, 515, 532, 547, 554, 588, 589, 590]
:large_blue_circle: com/starrocks/alter/reshard/SplittingTablet.java 7 8 87.50% [55]
:large_blue_circle: com/starrocks/alter/reshard/ReshardingTabletInfo.java 6 6 100.00% []
:large_blue_circle: com/starrocks/common/Config.java 2 2 100.00% []
:large_blue_circle: com/starrocks/alter/reshard/PublishTabletsInfo.java 9 9 100.00% []
:large_blue_circle: com/starrocks/alter/reshard/ReshardingMaterializedIndex.java 13 13 100.00% []
:large_blue_circle: com/starrocks/common/util/concurrent/lock/AutoCloseableLock.java 2 2 100.00% []
:large_blue_circle: com/starrocks/alter/reshard/TabletReshardJobMgr.java 8 8 100.00% []
:large_blue_circle: com/starrocks/persist/gson/GsonUtils.java 6 6 100.00% []

github-actions[bot] avatar Dec 13 '25 08:12 github-actions[bot]

[BE Incremental Coverage Report]

:white_check_mark: pass : 430 / 501 (85.83%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: be/src/storage/lake/txn_log_applier.cpp 4 8 50.00% [166, 207, 518, 558]
:large_blue_circle: be/src/storage/lake/transactions.cpp 48 66 72.73% [182, 183, 201, 214, 231, 232, 237, 238, 287, 309, 378, 394, 399, 400, 401, 402, 405, 407]
:large_blue_circle: be/src/storage/lake/tablet_reshard.cpp 172 205 83.90% [55, 56, 57, 59, 60, 61, 66, 67, 68, 70, 71, 72, 78, 79, 80, 138, 139, 140, 161, 171, 183, 184, 185, 199, 250, 251, 270, 279, 289, 290, 291, 305, 317]
:large_blue_circle: be/src/service/service_be/lake_service.cpp 196 212 92.45% [149, 164, 216, 217, 315, 316, 343, 344, 373, 374, 439, 440, 466, 467, 468, 565]
:large_blue_circle: be/src/storage/lake/tablet_reshard.h 10 10 100.00% []

github-actions[bot] avatar Dec 13 '25 09:12 github-actions[bot]