roaring-rs
roaring-rs copied to clipboard
Set operations for multiple sets at a time
Hey,
I discovered this pure Rust version of the Roaring library of Daniel Lemire et al., performances are pretty good at least way better than my sdset library.
I was wondering if you had any rough idea of how to implement a set operation and more specifically an union on multiple set at a time.
I dived a little bit into the C implementation and found out that the main performance gain seems to be related to the fact that the library doesn't change the internal representation (and keep the bitmap one) until all sets operations have been applied, then the internal type is changed. Am I right?
https://github.com/RoaringBitmap/CRoaring/blob/59d70d010da5f606f1339fb4c4f200be11f590c6/src/roaring.c#L611-L629
Thank you for your great job!
This seems pretty easy to do by having an iterator that walks across multiple bitmaps at once, returning all the containers at each key (similar to Pairs but expanded to more than just 2 bitmaps). Then taking the stores from the containers and doing the operation directly with them, before wrapping back into a container. Finally collecting all the (non-empty) resulting containers into a new bitmap.
Hey @Nemo157,
I totally understand what you proposed! I started working on the "hardest" part of the problem already and found out that this "multi-Pairs" type is not that easy to implement. So I would like you to take a quick look at what I achieved, there is many allocations, but most of those can't be removed, I believe.
https://github.com/Kerollmops/roaring-rs/blob/0d806a8dbcabab7f8f53d5dc3de8c321b2c94a69/src/bitmap/cmp.rs#L142-L196
EDIT I rewrote it with an heap and interior mutability, there is a little bit more boilerplate (PartialEq/ParitalOrd/Ord/Eq things) but overall it is faster, with less allocations and the main algorithm is easier to read.
https://github.com/Kerollmops/roaring-rs/blob/0a03dfb/src/bitmap/cmp.rs#L142-L236