starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Feature] Support for builtin inverted index

Open srlch opened this issue 2 weeks ago • 6 comments

What I'm doing:

This PR introduces builtin inverted index including the entire read and write logic. It builtin based on the bitmap index infrastructure.

Fixes #issue

What type of PR is this:

  • [ ] BugFix
  • [x] Feature
  • [ ] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [x] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [ ] 4.0
    • [ ] 3.5
    • [ ] 3.4
    • [ ] 3.3

[!NOTE] Adds a builtin inverted index built on bitmap dictionary, integrates GIN filtering/pruning in scan path, and updates FE/BE protocols and planner to support it.

  • Storage/Index (BE):
    • Add builtin inverted index: BuiltinInvertedWriter/Reader/Iterator, simple analyzer, plugin, and factory; register in build.
    • Enhance bitmap index: predicate-based dictionary seek, rowid union reads, batch-add optimizations, new writer APIs.
    • Extend InvertedReader/Iterator APIs (use IndexReadOptions, close) and support new builtin index meta (BuiltinInvertedIndexPB).
  • Scan/Execution (BE):
    • Pass IndexReadOptions through Segment/ColumnReader; initialize inverted iterators per column; close after use.
    • Add GIN filtering and optional post-index column pruning; wire flags from params; new profile counters/timers (GinFilterRows, GinFilter).
    • Lake rowset/tablet_reader propagate enable_gin_filter and prune_column_after_index_filter.
  • Predicates:
    • ColumnExprPredicate: LIKE wildcard only %; implement inverted-index seek for LIKE/MATCH.
  • FE/Planner:
    • Support builtin inverted index impl; disallow CLucene in shared-data; set enable_gin_filter and enable_prune_column_after_index_filter in scan node.
    • SchemaChangeHandler: adjust index id assignment for cloud-native tables (exclude VECTOR).
  • Proto/Thrift:
    • Add BUILTIN_INVERTED_INDEX and BuiltinInvertedIndexPB; extend TLakeScanNode/TOlapScanNode with GIN/prune flags.

Written by Cursor Bugbot for commit e3bf2202fc3467e560768df16ccf1da5d4fcc6ee. This will update automatically on new commits. Configure here.

srlch avatar Dec 10 '25 05:12 srlch

🧪 CI Insights

Here's what we observed from your CI run for e3bf2202.

🟢 All jobs passed!

But CI Insights is watching 👀

mergify[bot] avatar Dec 10 '25 05:12 mergify[bot]

@cursor review

alvin-celerdata avatar Dec 10 '25 06:12 alvin-celerdata

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Dec 11 '25 01:12 github-actions[bot]

[FE Incremental Coverage Report]

:white_check_mark: pass : 4 / 4 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: com/starrocks/planner/OlapScanNode.java 4 4 100.00% []

github-actions[bot] avatar Dec 11 '25 01:12 github-actions[bot]

[BE Incremental Coverage Report]

:x: fail : 58 / 79 (73.42%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: src/storage/lake/tablet_reader.cpp 0 1 00.00% [346]
:large_blue_circle: src/connector/lake_connector.cpp 0 8 00.00% [292, 293, 294, 295, 297, 298, 629, 630]
:large_blue_circle: src/storage/rowset/bitmap_index_reader.h 1 4 25.00% [133, 134, 136]
:large_blue_circle: src/storage/rowset/bitmap_index_writer.cpp 30 36 83.33% [89, 90, 91, 92, 244, 253]
:large_blue_circle: src/storage/rowset/bitmap_index_reader.cpp 17 20 85.00% [134, 135, 136]
:large_blue_circle: src/storage/rowset/segment.cpp 2 2 100.00% []
:large_blue_circle: src/storage/rowset/column_reader.cpp 7 7 100.00% []
:large_blue_circle: src/storage/rowset/segment.h 1 1 100.00% []

github-actions[bot] avatar Dec 11 '25 01:12 github-actions[bot]

@cursor review

alvin-celerdata avatar Dec 11 '25 02:12 alvin-celerdata