[Enhancement] sort dicts directly to speed up BitmapIndexWriter.
Why I'm doing:
Currently, we use a std::map at BitmapIndexWriterImpl::finish to get sorted dictionaries and sorted ngram dictionaries. It will waste a lots of CPU and memory according to the test that its result is shown below:
It is obvious that using a std::unordered_map to deduplicate and using a std::vector to store all keys to get a sorted dictionaries could get the best performance.
What I'm doing:
Use std::unordered_map to deduplicate dictionaries and use a std::vector to get sorted dictionaries.
What type of PR is this:
- [ ] BugFix
- [ ] Feature
- [x] Enhancement
- [ ] Refactor
- [ ] UT
- [ ] Doc
- [ ] Tool
Does this PR entail a change in behavior?
- [ ] Yes, this PR will result in a change in behavior.
- [x] No, this PR will not result in a change in behavior.
If yes, please specify the type of change:
- [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
- [ ] Parameter changes: default values, similar parameters but with different default values
- [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
- [ ] Feature removed
- [ ] Miscellaneous: upgrade & downgrade compatibility, etc.
Checklist:
- [x] I have added test cases for my bug fix or my new feature
- [ ] This pr needs user documentation (for new or modified features or behaviors)
- [ ] I have added documentation for my new feature or new function
- [ ] This is a backport pr
Bugfix cherry-pick branch check:
- [x] I have checked the version labels which the pr will be auto-backported to the target branch
- [ ] 4.0
- [ ] 3.5
- [ ] 3.4
- [ ] 3.3
🧪 CI Insights
Here's what we observed from your CI run for 49e479f8.
🟢 All jobs passed!
But CI Insights is watching 👀
@cursor review