CRoaring icon indicating copy to clipboard operation
CRoaring copied to clipboard

fastunion vs |= in a loop

Open maxdemarzi opened this issue 2 years ago • 1 comments

I have a std::vector of Roaring64Maps

std::vector<roaring::Roaring64Map> results = // some process

When I use fastunion (probably incorrectly):

    auto sharded_results = new const roaring::Roaring64Map*[results.size()];
    for (size_t i = 0; i < results.size(); ++i) {
        sharded_results[i] = &results.data()[i];
    }

    roaring::Roaring64Map combined = roaring::Roaring64Map::fastunion(results.size(), sharded_results);

It takes almost 5 seconds to combine them. However when I just do:

    roaring::Roaring64Map combined;
      for(const auto& sharded : results) {
        combined |= sharded;
    } 

It takes 2.8 seconds. In this case I only have 4 maps, and the combined carnality ends up just above 3 million. Am I doing something wrong or is this expected?

maxdemarzi avatar Dec 08 '22 05:12 maxdemarzi

It is not the case that the approach you take (two-by-two ORs) is necessarily inferior. More importantly, we don't currently have solid benchmarks for fastunion. That is something that we should add.

https://github.com/RoaringBitmap/CRoaring/blob/a644f44898d3c0f27af4406827bbde5f42a06b53/cpp/roaring64map.hh#LL1381

lemire avatar Dec 08 '22 15:12 lemire