starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Enhancement] sort dicts directly to speed up BitmapIndexWriter.

Open wuxueyang96 opened this issue 2 weeks ago • 1 comments

Why I'm doing:

Currently, we use a std::map at BitmapIndexWriterImpl::finish to get sorted dictionaries and sorted ngram dictionaries. It will waste a lots of CPU and memory according to the test that its result is shown below:

image

It is obvious that using a std::unordered_map to deduplicate and using a std::vector to store all keys to get a sorted dictionaries could get the best performance.

What I'm doing:

Use std::unordered_map to deduplicate dictionaries and use a std::vector to get sorted dictionaries.

What type of PR is this:

  • [ ] BugFix
  • [ ] Feature
  • [x] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [x] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [ ] 4.0
    • [ ] 3.5
    • [ ] 3.4
    • [ ] 3.3

wuxueyang96 avatar Dec 12 '25 07:12 wuxueyang96

🧪 CI Insights

Here's what we observed from your CI run for 49e479f8.

🟢 All jobs passed!

But CI Insights is watching 👀

mergify[bot] avatar Dec 12 '25 07:12 mergify[bot]

@cursor review

alvin-celerdata avatar Dec 12 '25 17:12 alvin-celerdata