databend icon indicating copy to clipboard operation
databend copied to clipboard

fix: where comparing old and new bitmap versions occurred when directly comparing bytes in join and group by operations.

Open KKould opened this issue 1 week ago โ€ข 6 comments

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Following the pull request (https://github.com/databendlabs/databend/pull/19041), the data structure used by Bitmap was changed from RoaringBitmap to HybridBitmap. Therefore, the serialization method changed. Even though deserialization is compatible with the old data, there are still instances where comparisons are made directly using bytes during group by or join operations. This can lead to inconsistent matching results for semantically identical data when comparing old and new bitmap data.

This PR attempts to refactor BitmapType's Scalar\Column, making it directly represent Bitmap instead of Binary, thereby improving compatibility and performance.

Tests

  • [x] Unit Test
  • [ ] Logic Test
  • [ ] Benchmark Test
  • [ ] No Test - Explain why

Type of change

  • [x] Bug Fix (non-breaking change which fixes an issue)
  • [ ] New Feature (non-breaking change which adds functionality)
  • [ ] Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • [ ] Documentation Update
  • [ ] Refactoring
  • [ ] Performance Improvement
  • [ ] Other (please describe):

This change isโ€‚Reviewable

KKould avatar Dec 09 '25 06:12 KKould

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. Repo admins can enable using credits for code reviews in their settings.

@codex review

KKould avatar Dec 09 '25 11:12 KKould

I have a feeling that it's not overly complicated here? Compatibility issues are something that will naturally diminish over time. I think it only needs to be dealt with where the old data is ingested, not all places.

It is recommended to refer to those locations where DataBlock::consume_convert_to_full appears.

forsaken628 avatar Dec 10 '25 13:12 forsaken628

Before commit adc9695f7a97ff76cf86eb47bb8f156b7d273c57, it was implemented as you described. However, following @sundy-li suggestion, we attempted to pre-serialize the Bitmap in this PR to address a series of previous issues. Please refer to the updated PR description for details.

KKould avatar Dec 11 '25 03:12 KKould

Before commit adc9695, it was implemented as you described. However, following @sundy-li suggestion, we attempted to pre-serialize the Bitmap in this PR to address a series of previous issues. Please refer to the updated PR description for details.

@codex address this feedback

BohuTANG avatar Dec 11 '25 04:12 BohuTANG

Codex couldn't complete this request. Try again later.

@codex review

KKould avatar Dec 15 '25 02:12 KKould