[Enhancement] Add range-split, parallel compaction and size-tiered compaction strategy for cloud native pk index (part1)
Why I'm doing:
This PR introduces significant enhancements to the primary key (PK) index compaction and parallelism mechanisms in shared-data mode for StarRocks. The changes implement a size-tiered compaction strategy, range-splitting functionality for parallel compaction, and related infrastructure including new thread pools and configuration parameters. The implementation adds three major components: size-tiered compaction strategy for selecting candidates, parallel compaction manager for executing compaction tasks concurrently, and fileset abstraction for managing non-overlapping SSTable ranges. More details : #66457
- size-tiered compaction strategy. (
lake/lake_persistent_index_size_tiered_compaction_strategy.cpp) - range-split. (
lake/persistent_index_sstable_fileset.cpp) - parallel pk index compact. (
lake/lake_persistent_index_parallel_compact_mgr.cpp)
What I'm doing:
This pull request introduces significant enhancements to the primary key (PK) index compaction and parallelism mechanisms in shared-data mode, focusing on configurability, scalability, and maintainability. The changes include new configuration options for PK index parallel compaction, the addition of thread pools for parallel operations, refactoring of the PK index compaction logic to support multiple output files, and integration of a manager for parallel compaction tasks.
Configuration and Parallelism Enhancements:
-
Added multiple new configuration options in
config.hfor fine-tuning PK index parallel compaction, file sizes, thread pool limits, and size-tiered compaction strategies. This enables more granular control over PK index performance and resource usage. -
Introduced new thread pools and a parallel compaction manager in
ExecEnv(exec_env.h/exec_env.cpp) for PK index get, memtable flush, and parallel compaction tasks, with proper initialization, shutdown, and resource management. [1] [2] [3] [4] [5] [6] [7]
PK Index Compaction Refactoring:
- Refactored the PK index compaction logic in
lake_persistent_index.cpp/lake_persistent_index.hto support generating multiple output files during compaction, based on configurable file size thresholds. TheKeyValueMergernow manages multiple builders and outputs, improving scalability for large datasets. [1] [2] [3] [4] [5]
Build System Updates:
- Updated
CMakeLists.txtto include new source files for parallel compaction and size-tiered strategies, ensuring the build system recognizes the new components.
Integration and Maintainability:
- Integrated the new parallel compaction manager and thread pools into the environment lifecycle, with clean shutdown and destruction logic to prevent resource leaks. [1] [2]
These changes collectively improve the performance, configurability, and maintainability of PK index management in shared-data mode, paving the way for more efficient large-scale compaction and query operations.
Fixes #66457
What type of PR is this:
- [ ] BugFix
- [ ] Feature
- [x] Enhancement
- [ ] Refactor
- [ ] UT
- [ ] Doc
- [ ] Tool
Does this PR entail a change in behavior?
- [ ] Yes, this PR will result in a change in behavior.
- [x] No, this PR will not result in a change in behavior.
If yes, please specify the type of change:
- [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
- [ ] Parameter changes: default values, similar parameters but with different default values
- [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
- [ ] Feature removed
- [ ] Miscellaneous: upgrade & downgrade compatibility, etc.
Checklist:
- [x] I have added test cases for my bug fix or my new feature
- [ ] This pr needs user documentation (for new or modified features or behaviors)
- [ ] I have added documentation for my new feature or new function
- [ ] This is a backport pr
Bugfix cherry-pick branch check:
- [x] I have checked the version labels which the pr will be auto-backported to the target branch
- [ ] 4.0
- [ ] 3.5
- [ ] 3.4
- [ ] 3.3
[!NOTE] Add range-split, parallel compaction, and size-tiered strategy for cloud-native PK index with new configs, thread pools, multi-output compaction, and docs/tests.
- Storage (PK Index, Lake):
- Implement range-split compaction with fileset abstraction (
persistent_index_sstable_fileset.*), track sstable key ranges and fileset IDs; table builder now exposesKeyRange().- Add parallel compaction manager and tasks (
lake_persistent_index_parallel_compact_mgr.*) with segmenting by key ranges and concurrent execution.- Introduce size-tiered compaction candidate picker (
lake_persistent_index_size_tiered_compaction_strategy.*).- Refactor compaction to support multiple output SSTables via
KeyValueMergerand update major compaction/apply logic (lake_persistent_index.*).- Extend memtable flush/build to emit key ranges; add sstable methods for fileset/range.
- Runtime/ExecEnv:
- Initialize/shutdown new PK index thread pools (get, memtable flush) and parallel compaction manager; expose accessors.
- Config & Docs:
- Lower
pk_parallel_execution_threshold_bytesto 100MB.- Add numerous PK index parallel compaction/get/memtable and size-tiered tuning knobs; document them (EN/JA/ZH).
- Protocol/Metadata:
- Add
PersistentIndexSstableRangePB, fileset IDs, and support multiple output sstables/ranges inTxnLogPB.- Build & Tests:
- Wire new sources in CMake.
- Add extensive unit tests for filesets, parallel compaction manager, size-tiered strategy, and sstable ranges.
Written by Cursor Bugbot for commit 09d67575dbe681275daf454428d1223048220563. This will update automatically on new commits. Configure here.
🧪 CI Insights
Here's what we observed from your CI run for e4c428e9.
🟢 All jobs passed!
But CI Insights is watching 👀
@cursor review
@cursor review
@cursor review
@cursor review
Quality Gate passed
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
@cursor review
@cursor review
@cursor review
[Java-Extensions Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)
[FE Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)
[BE Incremental Coverage Report]
:white_check_mark: pass : 597 / 650 (91.85%)
file detail
| path | covered_line | new_line | coverage | not_covered_line_detail | |
|---|---|---|---|---|---|
| :large_blue_circle: | be/src/storage/lake/lake_persistent_index_parallel_compact_mgr.h | 6 | 13 | 46.15% | [80, 84, 85, 86, 87, 88, 89] |
| :large_blue_circle: | be/src/storage/lake/persistent_index_sstable.cpp | 4 | 7 | 57.14% | [205, 206, 207] |
| :large_blue_circle: | be/src/storage/lake/persistent_index_sstable_fileset.cpp | 81 | 98 | 82.65% | [26, 61, 66, 67, 93, 109, 110, 145, 155, 160, 161, 162, 163, 164, 166, 167, 169] |
| :large_blue_circle: | be/src/storage/lake/lake_persistent_index_size_tiered_compaction_strategy.h | 6 | 7 | 85.71% | [47] |
| :large_blue_circle: | be/src/storage/lake/lake_persistent_index_parallel_compact_mgr.cpp | 241 | 261 | 92.34% | [117, 129, 130, 131, 158, 191, 195, 243, 249, 251, 296, 297, 298, 300, 318, 319, 320, 321, 396, 444] |
| :large_blue_circle: | be/src/storage/lake/lake_persistent_index_key_value_merger.cpp | 91 | 94 | 96.81% | [47, 50, 56] |
| :large_blue_circle: | be/src/storage/lake/lake_persistent_index_size_tiered_compaction_strategy.cpp | 88 | 90 | 97.78% | [160, 165] |
| :large_blue_circle: | be/src/storage/lake/lake_persistent_index.cpp | 37 | 37 | 100.00% | [] |
| :large_blue_circle: | be/src/runtime/exec_env.cpp | 22 | 22 | 100.00% | [] |
| :large_blue_circle: | be/src/storage/sstable/table_builder.cpp | 4 | 4 | 100.00% | [] |
| :large_blue_circle: | be/src/storage/lake/persistent_index_sstable_fileset.h | 5 | 5 | 100.00% | [] |
| :large_blue_circle: | be/src/storage/lake/persistent_index_memtable.cpp | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | be/src/storage/lake/lake_persistent_index_key_value_merger.h | 8 | 8 | 100.00% | [] |
| :large_blue_circle: | be/src/storage/lake/persistent_index_sstable.h | 3 | 3 | 100.00% | [] |