fix: where comparing old and new bitmap versions occurred when directly comparing bytes in join and group by operations.
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
Following the pull request (https://github.com/databendlabs/databend/pull/19041), the data structure used by Bitmap was changed from RoaringBitmap to HybridBitmap. Therefore, the serialization method changed. Even though deserialization is compatible with the old data, there are still instances where comparisons are made directly using bytes during group by or join operations. This can lead to inconsistent matching results for semantically identical data when comparing old and new bitmap data.
This PR attempts to refactor BitmapType's Scalar\Column, making it directly represent Bitmap instead of Binary, thereby improving compatibility and performance.
Tests
- [x] Unit Test
- [ ] Logic Test
- [ ] Benchmark Test
- [ ] No Test - Explain why
Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Breaking Change (fix or feature that could cause existing functionality not to work as expected)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. Repo admins can enable using credits for code reviews in their settings.
@codex review
I have a feeling that it's not overly complicated here? Compatibility issues are something that will naturally diminish over time. I think it only needs to be dealt with where the old data is ingested, not all places.
It is recommended to refer to those locations where DataBlock::consume_convert_to_full appears.
Before commit adc9695f7a97ff76cf86eb47bb8f156b7d273c57, it was implemented as you described. However, following @sundy-li suggestion, we attempted to pre-serialize the Bitmap in this PR to address a series of previous issues. Please refer to the updated PR description for details.
Before commit adc9695, it was implemented as you described. However, following @sundy-li suggestion, we attempted to pre-serialize the Bitmap in this PR to address a series of previous issues. Please refer to the updated PR description for details.
@codex address this feedback
Codex couldn't complete this request. Try again later.
@codex review