CRoaring icon indicating copy to clipboard operation
CRoaring copied to clipboard

Add a new method `roaring_bitmap_range_bool_array`.

Open RinChanNOWWW opened this issue 1 month ago • 15 comments

Currently, we can only convert the roaring bitmap to a sparse uint32 array or a dense bitset. In some case, we may need to get a dense boolean/uint8 array. For example, some database systems like ClickHouse, will use a uint8 array as a mask column to filter data.

This PR introduces roaring_bitmap_range_bool_array to convert a roaring bitmap to a bool array to meet the requirement. The implementation is quite simple, and optimizations (like utilize SIMD) could be introduced in later PRs if this PR is accepted by the community.

RinChanNOWWW avatar Nov 24 '25 08:11 RinChanNOWWW

@RinChanNOWWW

I am not 100% convinced about your motivation. We already support conversion to a bitset:

roaring_bitmap_t *r1 = roaring_bitmap_create();
for (uint32_t i = 100; i < 100000; i+= 1 + (i%5)) {
     roaring_bitmap_add(r1, i);
}
for (uint32_t i = 100000; i < 500000; i+= 100) {
     roaring_bitmap_add(r1, i);
}
roaring_bitmap_add_range(r1, 500000, 600000);
bitset_t * bitset = bitset_create();
bool success = roaring_bitmap_to_bitset(r1, bitset);
assert(success); // could fail due to memory allocation.
assert(bitset_count(bitset) == roaring_bitmap_get_cardinality(r1));
// You can then query the bitset:
for (uint32_t i = 100; i < 100000; i+= 1 + (i%5)) {
    assert(bitset_get(bitset,i));
}
for (uint32_t i = 100000; i < 500000; i+= 100) {
    assert(bitset_get(bitset,i));
}
// you must free the memory:
bitset_free(bitset);
roaring_bitmap_free(r1);

A bitset instance is quite simple:

struct bitset_s {
    uint64_t *CROARING_CBITSET_RESTRICT array;
    /* For simplicity and performance, we prefer to have a size and a capacity
     * that is a multiple of 64 bits. Thus we only track the size and the
     * capacity in terms of 64-bit words allocated */
    size_t arraysize;
    size_t capacity;
};

typedef struct bitset_s bitset_t;

I guess we can always argue that it is good to have more ways to get the same work done, but expanding our API is not free.

If this is related to something ClickHouse needs, then sure... but do you have a related ClickHouse issue ? Or discussion thread ?

lemire avatar Nov 24 '25 20:11 lemire

Hi @lemire, thanks for the response.

I also found that we can convert to bitset already. However, there are two main reasons the bitset API cannot meet my requirement:

  1. As described above, database system like ClickHouse uses a uint8 vector to store boolean values to do filtering. If I want to convert a roaring bitmap to a uint8 vector by using bitset, I need to convert a roaring bitmap to bitset, and then convert the bitset to a uint8 vector.
  2. There aren't a range API for bitset like roaring_bitmap_range_uint32_array. Sometimes I only need a subset of it.

There are already conversion from a roaring bitmap to a uint8 vector in ClickHouse, but the conversion is not efficient yet:

https://github.com/ClickHouse/ClickHouse/blob/e891253ffd874c91c6bb23398554f31c7a90450b/src/Storages/MergeTree/MergeTreeIndexReadResultPool.cpp#L264-L279

Current implementation is using a uint32 iterator to find all values in the roaring bitmap within a specific range, and fill the coresponding position to true in the uint8 vector. If we support roaring_bitmap_to_bool_array, we can utilize SIMD for optimzation.

Apache Doris also implement a similar way to iterate the roaring bitmap to fill a uint8 vector:

https://github.com/apache/doris/blob/a13241b76c7f09594ec4f9c1d1433af51bc80548/be/src/olap/rowset/segment_v2/segment_iterator.cpp#L2833-L2843

RinChanNOWWW avatar Nov 25 '25 03:11 RinChanNOWWW

@RinChanNOWWW That's an excellent answer, but I am not sure what you have implemented would work for ClickHouse.

Look at the use case:

        roaring::api::roaring_uint32_iterator_t it;
        roaring_iterator_init(data.bitmap32, &it);
        if (!roaring_uint32_iterator_move_equalorlarger(&it, starting_row))
            return false;

        bool has_value = false;
        while (it.current_value < ending_row)
        {
            has_value = true;
            pos[it.current_value - starting_row] = 1;
            if (!roaring_uint32_iterator_advance(&it))
                break;
        }
        return has_value;

So it seems that it needs to operate over a range, doesn't it ?

Of course, you can always dump the whole thing to a temporary buffer and copy it over, but that's not efficient.

lemire avatar Nov 25 '25 04:11 lemire

@lemire

So it seems that it needs to operate over a range, doesn't it ?

Yes.

Of course, you can always dump the whole thing to a temporary buffer and copy it over, but that's not efficient.

I would like to implement a range API roaring_bitmap_range_bool_array later if these APIs are welcome by the community. And in my opinion, these two APIs (roaring_bitmap_to_bool_array and roaring_bitmap_range_bool_array) should exist togehter just like the ones for uint32 array, so we can have a heuristic strategy to decide to use which one (the full one or the range one) in different scenarios.

RinChanNOWWW avatar Nov 25 '25 05:11 RinChanNOWWW

@RinChanNOWWW I understand, but I am concerned about adding functions for which I see no obvious use. Your analysis is excellent, but it suggests that the range functions are what we want to have. If we had the ranged function, then your version would become unnecessary.

lemire avatar Nov 25 '25 05:11 lemire

@lemire You are right. So let me implement the range function in this PR first.

RinChanNOWWW avatar Nov 25 '25 06:11 RinChanNOWWW

Hi @lemire. This PR is ready for review now.

It mainly adds three APIs:

  • roaring_bitmap_range_bool_array for converting a range of the bitmap to a bool array.
  • roaring_uint32_iterator_read_into_bool for iterating the iterator until a max_value (which is excluded) and fill the bool array.
  • container_iterator_read_into_bool for iterating the iterator until a max_value (which is excluded) and fill the bool array or drain the whole container from the intitial iterator.

As the PR is quite big, SIMD optimization and the same API for roaring64 is not included.

RinChanNOWWW avatar Dec 03 '25 09:12 RinChanNOWWW

I wrote a benchmark and found that the performance is already better than iterating by uint32 iterator.

Running microbenchmarks/mybench
Run on (48 X 2593.91 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x48)
  L1 Instruction 32 KiB (x48)
  L2 Unified 1024 KiB (x24)
  L3 Unified 36608 KiB (x1)
Load Average: 1.67, 4.81, 14.00
--------------------------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------
BM_ToBoolArray_FilterRate_0_1                 4337 ns         4337 ns       161101 items_per_second=188.897G/s filter_rate=0.1%,iterations=100
BM_ToBoolArrayByIterator_FilterRate_0_1       7568 ns         7568 ns        91865 items_per_second=108.243G/s filter_rate=0.1%,iterations=100
BM_ToBoolArray_FilterRate_1                  12941 ns        12940 ns        53960 items_per_second=63.3062G/s filter_rate=1%,iterations=100
BM_ToBoolArrayByIterator_FilterRate_1        42449 ns        42447 ns        16445 items_per_second=19.2992G/s filter_rate=1%,iterations=100
BM_ToBoolArray_FilterRate_10                180551 ns       180536 ns         4058 items_per_second=4.53759G/s filter_rate=10%,iterations=100
BM_ToBoolArrayByIterator_FilterRate_10      732994 ns       732979 ns          982 items_per_second=1.11763G/s filter_rate=10%,iterations=100
BM_ToBoolArray_FilterRate_50                521957 ns       521932 ns         1344 items_per_second=1.56955G/s filter_rate=50%,iterations=100
BM_ToBoolArrayByIterator_FilterRate_50     3050021 ns      3049814 ns          229 items_per_second=268.607M/s filter_rate=50%,iterations=100
BM_ToBoolArray_FilterRate_90                834446 ns       834433 ns          837 items_per_second=981.745M/s filter_rate=90%,iterations=100
BM_ToBoolArrayByIterator_FilterRate_90     5388504 ns      5388220 ns          130 items_per_second=152.035M/s filter_rate=90%,iterations=100
BM_ToBoolArray_FilterRate_99                860430 ns       860417 ns          814 items_per_second=952.097M/s filter_rate=99%,iterations=100
BM_ToBoolArrayByIterator_FilterRate_99     5908431 ns      5908341 ns          119 items_per_second=138.651M/s filter_rate=99%,iterations=100

The benchmark codes: https://pastebin.com/wb72djVn

RinChanNOWWW avatar Dec 04 '25 04:12 RinChanNOWWW

My concern at this point is the user of the iterator in the function signatures. It seems to me that it makes the code unnecessarily complicated.

lemire avatar Dec 04 '25 05:12 lemire

I write a benchmark and find that the performance is already better than iterating by uint32 iterator.

Yes. I am not concerned with the efficiency of your implementation. Your code will be fast, that's fine.

lemire avatar Dec 04 '25 05:12 lemire

My concern at this point is the user of the iterator in the function signatures. It seems to me that it makes the code unnecessarily complicated.

@lemire The iterater function (roaring_uint32_iterator_read_into_bool) is mainly used for two cases:

  • To impl roaring_bitmap_range_bool_array.
  • Make user can manipulate the iterator by themselves. For example, if a user want to call roaring_bitmap_range_bool_array(start=100, end=200) and then call roaring_bitmap_range_bool_array(start=200, end=300), he can use roaring_uint32_iterator_read_into_bool like this:
bool *ans = new bool[SIZE];
roaring_uint32_iterator_read_into_bool(it, ans + it->current_value, max_value=200);
// Some other operations...
// Continue to iterator `it`
roaring_uint32_iterator_read_into_bool(it, ans + it->current_value, max_value=300);

to skip unnecessary seeks for their "range start"s.

I will fix some of your code reviews and give better comments for the function.

RinChanNOWWW avatar Dec 04 '25 06:12 RinChanNOWWW

@RinChanNOWWW My recommendation at this stage is that we tune the change in CRoaring to what is useful to ClickHouse's design. It would be a shame to add a specialized function in CRoaring that then goes unused. We ended having to support and maintain an unused function.

So let us make sure that there is a very good chance that ClickHouse will accept your PR, and then let us make CRoaring work according to this PR.

In other words, I am all for adopting a new function if ClickHouse will adopt it, but let us first make sure that it really meets the needs of ClickHouse.

My concern is that it is a highly specialized and not generally useful idea. It is a bit silly to extract to a bool array. There are very few cases where this is a good idea.

lemire avatar Dec 04 '25 15:12 lemire

I am going to run tests. When you are ready to issue a ClickHouse pull request, we can prepare a tentative release, and see how it works out.

lemire avatar Dec 06 '25 00:12 lemire

Fixed sanitizer fail in unit test.

RinChanNOWWW avatar Dec 06 '25 12:12 RinChanNOWWW

I am going to run tests. When you are ready to issue a ClickHouse pull request, we can prepare a tentative release, and see how it works out.

@lemire Sure. I'm going to use the new API in https://github.com/ClickHouse/ClickHouse/pull/90266

RinChanNOWWW avatar Dec 06 '25 12:12 RinChanNOWWW