roaring-rs
roaring-rs copied to clipboard
Unions on multiple bitmaps at a time
I would like to introduce methods to do operations on multiple bitmaps at a time, this can greatly improve performances. This PR only introduce the union operation.
In this PR I introduce the RoaringBitmap::union_of method which take a Iterator of Bitmaps and modify returns a new bitmap. Internally it uses a BinaryHeap to always find the lowest containers of the operand bitmaps.
~I am not sure of the function signature, the other operations are named union_with and take a mutable self ref. The function signature I use here would be invalid for difference and symmetric difference, as the operations must be done on a base bitmap.~ ~I changed it to become an in-place operation.~ And I change it again to be kind of a constructor.
Related to #57.
I just added some benchmarks and the results are not so great, probably because bitmaps lengths are not that big.
union_with_multi time: [2.1961 us 2.2020 us 2.2085 us]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
union_with_multi by hand
time: [1.6332 us 1.6413 us 1.6547 us]
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) low mild
3 (3.00%) high mild
2 (2.00%) high severe
Even with more bitmaps the banchmarks results are not good at all.
union_with_multi time: [206.96 us 207.68 us 208.40 us]
change: [-0.4573% -0.0425% +0.3423%] (p = 0.84 > 0.05)
No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
union_with_multi by hand
time: [60.290 us 60.506 us 60.778 us]
change: [-8.8042% -6.5906% -4.5629%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
6 (6.00%) high mild
6 (6.00%) high severe
In my understanding the advantage we have with the union_with_multi method is that it will only do unions on containers that are involved, skipping containers that are alone and do not interfere with others. Do I need to design tests for this behavior?
EDIT I read again what you said in your comment and yeah, I need to extract the stores from the containers and don't do the operation directly on the containers (because it calls ensure_correct_store after each operation, I believe?).
I think a signature something like this would make sense:
impl RoaringBitmap {
pub fn union_of(bitmaps: impl IntoIterator<Item = &RoaringBitmap>) -> Self
}
there's no reuse of existing storage so having a self instance isn't really necessary.
I think there's likely a way to make Muple more efficient, I'll have to take a look at it in a bit more detail.
I updated the algorithm to do the operations directly on the Stores and not the containers, benchmarks results didn't really changed so I decided to check performances in detail with fleamegraph.
Obviously performances are impacted by the Vec returned by the Muple Iterator implementation. If we use the standard iterator we can't return a view into an internal buffer, because the Item is not bounded to the appropriate lifetime. So one solution could be to use a custom next method that can return a view into an internal buffer.

Ok, so I achieve better performances by using a custom next method and reusing the Vec at every iteration.
union_of time: [103.60 us 104.40 us 105.30 us]
change: [-3.7481% -2.6579% -1.5749%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
3 (3.00%) high mild
12 (12.00%) high severe
union_of by hand time: [84.114 us 84.403 us 84.660 us]
change: [+0.4501% +0.8097% +1.1816%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

I recommend referring back to the Java approach which has been emulated in C and Go...
https://github.com/RoaringBitmap/RoaringBitmap/blob/master/RoaringBitmap/src/main/java/org/roaringbitmap/FastAggregation.java#L433
The code is like so...
RoaringBitmap answer = new RoaringBitmap();
for (int k = 0; k < bitmaps.length; ++k) {
answer.naivelazyor(bitmaps[k]);
}
answer.repairAfterLazy();
return answer;
In turn naivelazyor calls lazyIOR on the containers when doing intersections.
What this does, roughly, is that...
-
When computing the union of two arrays, it will eagerly switch to creating a bitmap container, even if that would not be normally warranted.
-
When doing computations over bitmap containers, it will not track the cardinality.
Then "repairAfterLazy" goes through the code and checks the bitmap containers and possibly convert them back to arrays.
The intuition is as follows... the union between two bitmap containers, or between an array container and a bitmap container, or even between two array container when the output is a bitmap container, can be done efficiently (CPU wise). And if you have many Roaring bitmaps to begin with, you are likely to end up with bitmap containers in the end, so let us jump straight ahead and create them early in the process.
Empirically, this works well quite often.