roaring-rs Unions on multiple bitmaps at a time

trafficstars

I would like to introduce methods to do operations on multiple bitmaps at a time, this can greatly improve performances. This PR only introduce the union operation.

In this PR I introduce the RoaringBitmap::union_of method which take a Iterator of Bitmaps and modify returns a new bitmap. Internally it uses a BinaryHeap to always find the lowest containers of the operand bitmaps.

~I am not sure of the function signature, the other operations are named union_with and take a mutable self ref. The function signature I use here would be invalid for difference and symmetric difference, as the operations must be done on a base bitmap.~ ~I changed it to become an in-place operation.~ And I change it again to be kind of a constructor.

Related to #57.

Jun 03 '20 09:06 Kerollmops

I just added some benchmarks and the results are not so great, probably because bitmaps lengths are not that big.

union_with_multi        time:   [2.1961 us 2.2020 us 2.2085 us]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

union_with_multi by hand
                        time:   [1.6332 us 1.6413 us 1.6547 us]
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

Jun 03 '20 12:06 Kerollmops

Even with more bitmaps the banchmarks results are not good at all.

union_with_multi        time:   [206.96 us 207.68 us 208.40 us]
                        change: [-0.4573% -0.0425% +0.3423%] (p = 0.84 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

union_with_multi by hand
                        time:   [60.290 us 60.506 us 60.778 us]
                        change: [-8.8042% -6.5906% -4.5629%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe

In my understanding the advantage we have with the union_with_multi method is that it will only do unions on containers that are involved, skipping containers that are alone and do not interfere with others. Do I need to design tests for this behavior?

EDIT I read again what you said in your comment and yeah, I need to extract the stores from the containers and don't do the operation directly on the containers (because it calls ensure_correct_store after each operation, I believe?).

Jun 03 '20 12:06 Kerollmops

I think a signature something like this would make sense:

impl RoaringBitmap {
  pub fn union_of(bitmaps: impl IntoIterator<Item = &RoaringBitmap>) -> Self
}

there's no reuse of existing storage so having a self instance isn't really necessary.

I think there's likely a way to make Muple more efficient, I'll have to take a look at it in a bit more detail.

Jun 03 '20 13:06 Nemo157

I updated the algorithm to do the operations directly on the Stores and not the containers, benchmarks results didn't really changed so I decided to check performances in detail with fleamegraph.

Obviously performances are impacted by the Vec returned by the Muple Iterator implementation. If we use the standard iterator we can't return a view into an internal buffer, because the Item is not bounded to the appropriate lifetime. So one solution could be to use a custom next method that can return a view into an internal buffer.

Capture d’écran 2020-06-03 à 15 34 42

Jun 03 '20 13:06 Kerollmops

Ok, so I achieve better performances by using a custom next method and reusing the Vec at every iteration.

union_of                time:   [103.60 us 104.40 us 105.30 us]
                        change: [-3.7481% -2.6579% -1.5749%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) high mild
  12 (12.00%) high severe

union_of by hand        time:   [84.114 us 84.403 us 84.660 us]
                        change: [+0.4501% +0.8097% +1.1816%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Capture d’écran 2020-06-03 à 15 59 36

Jun 03 '20 13:06 Kerollmops

I recommend referring back to the Java approach which has been emulated in C and Go...

https://github.com/RoaringBitmap/RoaringBitmap/blob/master/RoaringBitmap/src/main/java/org/roaringbitmap/FastAggregation.java#L433

The code is like so...

    RoaringBitmap answer = new RoaringBitmap();
    for (int k = 0; k < bitmaps.length; ++k) {
      answer.naivelazyor(bitmaps[k]);
    }
    answer.repairAfterLazy();
    return answer;

In turn naivelazyor calls lazyIOR on the containers when doing intersections.

What this does, roughly, is that...

When computing the union of two arrays, it will eagerly switch to creating a bitmap container, even if that would not be normally warranted.
When doing computations over bitmap containers, it will not track the cardinality.

Then "repairAfterLazy" goes through the code and checks the bitmap containers and possibly convert them back to arrays.

The intuition is as follows... the union between two bitmap containers, or between an array container and a bitmap container, or even between two array container when the output is a bitmap container, can be done efficiently (CPU wise). And if you have many Roaring bitmaps to begin with, you are likely to end up with bitmap containers in the end, so let us jump straight ahead and create them early in the process.

Empirically, this works well quite often.

Sep 10 '20 14:09 lemire

roaring-rs roaring-rs copied to clipboard

Unions on multiple bitmaps at a time

roaring-rs
roaring-rs copied to clipboard