roaring-rs icon indicating copy to clipboard operation
roaring-rs copied to clipboard

Introduce multi operations helper methods

Open Kerollmops opened this issue 4 years ago • 7 comments

Closes #58 and Fixes #57.

image image

Kerollmops avatar May 28 '21 19:05 Kerollmops

I tried the "naive lazy" alg as suggested by @lemire in #58 and it is indeed faster (and simpler) for OR. It also does better than sequential for XOR.

Unsurprisingly it's worse for SUB and AND.

My tests were all using WIKILEAKS_NOQUOTES_SRT

My experiments are here https://github.com/saik0/roaring-rs/blob/naive-lazy-ops/src/bitmap/multi_ops.rs#L305.

saik0 avatar Jan 07 '22 20:01 saik0

Curiosity got the better of me, so I tied some other strategies.

  1. Inside the multi-ops there the container access pattern is effectively random. BTreeMap and HashMap keyed by the container key were both improvements. HashMap was especially fast when using the identity hash.
  2. Copying can be avoided with COW

saik0 avatar Jan 08 '22 02:01 saik0

Unfortunately, you broke the link, next time could you provide a permanent link, please 😄

I think you can open a draft PR on my own branch maybe, this way I can see the changes you made. It seems like you have already done a lot of work there and you PR outsize mine and opening a PR on main/master is maybe better 😂

Note that I am impressed by the work you do, we both clearly want to make it as fast if not faster than the C roaring library! 🚀

Kerollmops avatar Jan 09 '22 10:01 Kerollmops

Unfortunately, you broke the link, next time could you provide a permanent link, please 😄

Yes. Next time. 😄

I think you can open a draft PR on my own branch maybe, this way I can see the changes you made. It seems like you have already done a lot of work there and you PR outsize mine and opening a PR on main/master is maybe better 😂

The branch includes several slightly different variations of the same alg. It also checks in the sample datasets into the repo for benchmarking. It's hardly merge ready, but it's helpful for evaluating which strategy to choose.

Here's a PR on my own repo: saik0/roaring-rs#1

Note that I am impressed by the work you do

Thank you! I've had COVID for the last week and have been feeling awful and isolated. Hearing that felt pretty good! 😄

make it as fast if not faster than the C roaring library!

Thats a tall order!

saik0 avatar Jan 10 '22 02:01 saik0

The branch includes several slightly different variations of the same alg. It also checks in the sample datasets into the repo for benchmarking. It's hardly merge ready, but it's helpful for evaluating which strategy to choose.

Here's a PR on my own repo: saik0#1

Thank you! I've had COVID for the last week and have been feeling awful and isolated. Hearing that felt pretty good! 😄

I hope you're fine now! I caught it in 2019 too, lost taste and that was hard too. I understand! Just do what you love most and wait until you recover taste 😃

Thats a tall order!

Indeed, let's try to do our best, you already have done a lot with those small PRs!

Maybe you can find the time to publish the benchmarks of your experiments from the https://github.com/saik0/roaring-rs/pull/1 and open a PR on this repo? But maybe you should wait until we merge most of them, would be better to avoid merge conflicts, as you wish!

Kerollmops avatar Jan 10 '22 15:01 Kerollmops

Here are the benchmarks from my experiments on saik0#1

Screen Shot 2022-01-12 at 11 19 53 AM Screen Shot 2022-01-12 at 11 20 17 AM

saik0 avatar Jan 12 '22 19:01 saik0

Wow, the diff with the multi-or is quite impressive, I don't have a comparison with the multi-xor, though! 😃

Kerollmops avatar Jan 12 '22 20:01 Kerollmops