[Feature] Support for builtin inverted index
What I'm doing:
This PR introduces builtin inverted index including the entire read and write logic. It builtin based on the bitmap index infrastructure.
Fixes #issue
What type of PR is this:
- [ ] BugFix
- [x] Feature
- [ ] Enhancement
- [ ] Refactor
- [ ] UT
- [ ] Doc
- [ ] Tool
Does this PR entail a change in behavior?
- [ ] Yes, this PR will result in a change in behavior.
- [x] No, this PR will not result in a change in behavior.
If yes, please specify the type of change:
- [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
- [ ] Parameter changes: default values, similar parameters but with different default values
- [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
- [ ] Feature removed
- [ ] Miscellaneous: upgrade & downgrade compatibility, etc.
Checklist:
- [x] I have added test cases for my bug fix or my new feature
- [ ] This pr needs user documentation (for new or modified features or behaviors)
- [ ] I have added documentation for my new feature or new function
- [ ] This is a backport pr
Bugfix cherry-pick branch check:
- [x] I have checked the version labels which the pr will be auto-backported to the target branch
- [ ] 4.0
- [ ] 3.5
- [ ] 3.4
- [ ] 3.3
[!NOTE] Adds a builtin inverted index built on bitmap dictionary, integrates GIN filtering/pruning in scan path, and updates FE/BE protocols and planner to support it.
- Storage/Index (BE):
- Add builtin inverted index:
BuiltinInvertedWriter/Reader/Iterator, simple analyzer, plugin, and factory; register in build.- Enhance bitmap index: predicate-based dictionary seek, rowid union reads, batch-add optimizations, new writer APIs.
- Extend
InvertedReader/IteratorAPIs (useIndexReadOptions, close) and support new builtin index meta (BuiltinInvertedIndexPB).- Scan/Execution (BE):
- Pass
IndexReadOptionsthroughSegment/ColumnReader; initialize inverted iterators per column; close after use.- Add GIN filtering and optional post-index column pruning; wire flags from params; new profile counters/timers (
GinFilterRows,GinFilter).- Lake rowset/tablet_reader propagate
enable_gin_filterandprune_column_after_index_filter.- Predicates:
ColumnExprPredicate: LIKE wildcard only%; implement inverted-index seek for LIKE/MATCH.- FE/Planner:
- Support builtin inverted index impl; disallow CLucene in shared-data; set
enable_gin_filterandenable_prune_column_after_index_filterin scan node.SchemaChangeHandler: adjust index id assignment for cloud-native tables (exclude VECTOR).- Proto/Thrift:
- Add
BUILTIN_INVERTED_INDEXandBuiltinInvertedIndexPB; extendTLakeScanNode/TOlapScanNodewith GIN/prune flags.Written by Cursor Bugbot for commit e3bf2202fc3467e560768df16ccf1da5d4fcc6ee. This will update automatically on new commits. Configure here.
🧪 CI Insights
Here's what we observed from your CI run for e3bf2202.
🟢 All jobs passed!
But CI Insights is watching 👀
@cursor review
[Java-Extensions Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)
[FE Incremental Coverage Report]
:white_check_mark: pass : 4 / 4 (100.00%)
file detail
| path | covered_line | new_line | coverage | not_covered_line_detail | |
|---|---|---|---|---|---|
| :large_blue_circle: | com/starrocks/planner/OlapScanNode.java | 4 | 4 | 100.00% | [] |
[BE Incremental Coverage Report]
:x: fail : 58 / 79 (73.42%)
file detail
| path | covered_line | new_line | coverage | not_covered_line_detail | |
|---|---|---|---|---|---|
| :large_blue_circle: | src/storage/lake/tablet_reader.cpp | 0 | 1 | 00.00% | [346] |
| :large_blue_circle: | src/connector/lake_connector.cpp | 0 | 8 | 00.00% | [292, 293, 294, 295, 297, 298, 629, 630] |
| :large_blue_circle: | src/storage/rowset/bitmap_index_reader.h | 1 | 4 | 25.00% | [133, 134, 136] |
| :large_blue_circle: | src/storage/rowset/bitmap_index_writer.cpp | 30 | 36 | 83.33% | [89, 90, 91, 92, 244, 253] |
| :large_blue_circle: | src/storage/rowset/bitmap_index_reader.cpp | 17 | 20 | 85.00% | [134, 135, 136] |
| :large_blue_circle: | src/storage/rowset/segment.cpp | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | src/storage/rowset/column_reader.cpp | 7 | 7 | 100.00% | [] |
| :large_blue_circle: | src/storage/rowset/segment.h | 1 | 1 | 100.00% | [] |
@cursor review