starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Enhancement] Add range-split, parallel compaction and size-tiered compaction strategy for cloud native pk index (part1)

Open luohaha opened this issue 2 weeks ago • 8 comments

Why I'm doing:

This PR introduces significant enhancements to the primary key (PK) index compaction and parallelism mechanisms in shared-data mode for StarRocks. The changes implement a size-tiered compaction strategy, range-splitting functionality for parallel compaction, and related infrastructure including new thread pools and configuration parameters. The implementation adds three major components: size-tiered compaction strategy for selecting candidates, parallel compaction manager for executing compaction tasks concurrently, and fileset abstraction for managing non-overlapping SSTable ranges. More details : #66457

  • size-tiered compaction strategy. (lake/lake_persistent_index_size_tiered_compaction_strategy.cpp)
  • range-split. (lake/persistent_index_sstable_fileset.cpp)
  • parallel pk index compact. (lake/lake_persistent_index_parallel_compact_mgr.cpp)

What I'm doing:

This pull request introduces significant enhancements to the primary key (PK) index compaction and parallelism mechanisms in shared-data mode, focusing on configurability, scalability, and maintainability. The changes include new configuration options for PK index parallel compaction, the addition of thread pools for parallel operations, refactoring of the PK index compaction logic to support multiple output files, and integration of a manager for parallel compaction tasks.

Configuration and Parallelism Enhancements:

  • Added multiple new configuration options in config.h for fine-tuning PK index parallel compaction, file sizes, thread pool limits, and size-tiered compaction strategies. This enables more granular control over PK index performance and resource usage.

  • Introduced new thread pools and a parallel compaction manager in ExecEnv (exec_env.h/exec_env.cpp) for PK index get, memtable flush, and parallel compaction tasks, with proper initialization, shutdown, and resource management. [1] [2] [3] [4] [5] [6] [7]

PK Index Compaction Refactoring:

  • Refactored the PK index compaction logic in lake_persistent_index.cpp/lake_persistent_index.h to support generating multiple output files during compaction, based on configurable file size thresholds. The KeyValueMerger now manages multiple builders and outputs, improving scalability for large datasets. [1] [2] [3] [4] [5]

Build System Updates:

  • Updated CMakeLists.txt to include new source files for parallel compaction and size-tiered strategies, ensuring the build system recognizes the new components.

Integration and Maintainability:

  • Integrated the new parallel compaction manager and thread pools into the environment lifecycle, with clean shutdown and destruction logic to prevent resource leaks. [1] [2]

These changes collectively improve the performance, configurability, and maintainability of PK index management in shared-data mode, paving the way for more efficient large-scale compaction and query operations.

Fixes #66457

What type of PR is this:

  • [ ] BugFix
  • [ ] Feature
  • [x] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [x] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [ ] 4.0
    • [ ] 3.5
    • [ ] 3.4
    • [ ] 3.3

[!NOTE] Add range-split, parallel compaction, and size-tiered strategy for cloud-native PK index with new configs, thread pools, multi-output compaction, and docs/tests.

  • Storage (PK Index, Lake):
    • Implement range-split compaction with fileset abstraction (persistent_index_sstable_fileset.*), track sstable key ranges and fileset IDs; table builder now exposes KeyRange().
    • Add parallel compaction manager and tasks (lake_persistent_index_parallel_compact_mgr.*) with segmenting by key ranges and concurrent execution.
    • Introduce size-tiered compaction candidate picker (lake_persistent_index_size_tiered_compaction_strategy.*).
    • Refactor compaction to support multiple output SSTables via KeyValueMerger and update major compaction/apply logic (lake_persistent_index.*).
    • Extend memtable flush/build to emit key ranges; add sstable methods for fileset/range.
  • Runtime/ExecEnv:
    • Initialize/shutdown new PK index thread pools (get, memtable flush) and parallel compaction manager; expose accessors.
  • Config & Docs:
    • Lower pk_parallel_execution_threshold_bytes to 100MB.
    • Add numerous PK index parallel compaction/get/memtable and size-tiered tuning knobs; document them (EN/JA/ZH).
  • Protocol/Metadata:
    • Add PersistentIndexSstableRangePB, fileset IDs, and support multiple output sstables/ranges in TxnLogPB.
  • Build & Tests:
    • Wire new sources in CMake.
    • Add extensive unit tests for filesets, parallel compaction manager, size-tiered strategy, and sstable ranges.

Written by Cursor Bugbot for commit 09d67575dbe681275daf454428d1223048220563. This will update automatically on new commits. Configure here.

luohaha avatar Dec 09 '25 11:12 luohaha

🧪 CI Insights

Here's what we observed from your CI run for e4c428e9.

🟢 All jobs passed!

But CI Insights is watching 👀

mergify[bot] avatar Dec 09 '25 11:12 mergify[bot]

@cursor review

alvin-celerdata avatar Dec 09 '25 17:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 10 '25 17:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 11 '25 15:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 12 '25 17:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 13 '25 16:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 14 '25 03:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 15 '25 02:12 alvin-celerdata

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Dec 15 '25 11:12 github-actions[bot]

[FE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Dec 15 '25 12:12 github-actions[bot]

[BE Incremental Coverage Report]

:white_check_mark: pass : 597 / 650 (91.85%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: be/src/storage/lake/lake_persistent_index_parallel_compact_mgr.h 6 13 46.15% [80, 84, 85, 86, 87, 88, 89]
:large_blue_circle: be/src/storage/lake/persistent_index_sstable.cpp 4 7 57.14% [205, 206, 207]
:large_blue_circle: be/src/storage/lake/persistent_index_sstable_fileset.cpp 81 98 82.65% [26, 61, 66, 67, 93, 109, 110, 145, 155, 160, 161, 162, 163, 164, 166, 167, 169]
:large_blue_circle: be/src/storage/lake/lake_persistent_index_size_tiered_compaction_strategy.h 6 7 85.71% [47]
:large_blue_circle: be/src/storage/lake/lake_persistent_index_parallel_compact_mgr.cpp 241 261 92.34% [117, 129, 130, 131, 158, 191, 195, 243, 249, 251, 296, 297, 298, 300, 318, 319, 320, 321, 396, 444]
:large_blue_circle: be/src/storage/lake/lake_persistent_index_key_value_merger.cpp 91 94 96.81% [47, 50, 56]
:large_blue_circle: be/src/storage/lake/lake_persistent_index_size_tiered_compaction_strategy.cpp 88 90 97.78% [160, 165]
:large_blue_circle: be/src/storage/lake/lake_persistent_index.cpp 37 37 100.00% []
:large_blue_circle: be/src/runtime/exec_env.cpp 22 22 100.00% []
:large_blue_circle: be/src/storage/sstable/table_builder.cpp 4 4 100.00% []
:large_blue_circle: be/src/storage/lake/persistent_index_sstable_fileset.h 5 5 100.00% []
:large_blue_circle: be/src/storage/lake/persistent_index_memtable.cpp 1 1 100.00% []
:large_blue_circle: be/src/storage/lake/lake_persistent_index_key_value_merger.h 8 8 100.00% []
:large_blue_circle: be/src/storage/lake/persistent_index_sstable.h 3 3 100.00% []

github-actions[bot] avatar Dec 15 '25 12:12 github-actions[bot]