CRoaring
CRoaring copied to clipboard
fastunion vs |= in a loop
I have a std::vector of Roaring64Maps
std::vector<roaring::Roaring64Map> results = // some process
When I use fastunion (probably incorrectly):
auto sharded_results = new const roaring::Roaring64Map*[results.size()];
for (size_t i = 0; i < results.size(); ++i) {
sharded_results[i] = &results.data()[i];
}
roaring::Roaring64Map combined = roaring::Roaring64Map::fastunion(results.size(), sharded_results);
It takes almost 5 seconds to combine them. However when I just do:
roaring::Roaring64Map combined;
for(const auto& sharded : results) {
combined |= sharded;
}
It takes 2.8 seconds. In this case I only have 4 maps, and the combined carnality ends up just above 3 million. Am I doing something wrong or is this expected?
It is not the case that the approach you take (two-by-two ORs) is necessarily inferior. More importantly, we don't currently have solid benchmarks for fastunion. That is something that we should add.
https://github.com/RoaringBitmap/CRoaring/blob/a644f44898d3c0f27af4406827bbde5f42a06b53/cpp/roaring64map.hh#LL1381