roaring-rs Optimization: Reduce container search time complexity

let B₁, B₂ = some roaring bitmaps let N₁ = number of containers in B₁ let N₂ = number of containers in B₂

Currently binary operations such as B₁ | B₂ have an N₁ * log(N₂) container lookups.

For commutative & (!assignment) ops: We can flip B₁ for B₂ when N₁ > N₂.
- In other words: Whenever it is possible to flip the order: let the log search be over the larger collection
Given that containers are sorted. While iterating over containers in B₁, searching for containers in B₂:
- If a container is found at iteration I: The idx at iteration I+1 will be strictly greater than idx at I
- If a container is not found at iteration I: The idx at iteration I+1 will be greater than or equal to the insertion point at I
- In either case, we can logarithmically reduce the search space for every subsequent iteration, so it becomes log(log(N))

For commutative & !assignment ops time complexity becomes: min(N₁, N₂) * log(log(max(N₁, N₂)))
For ops that are (!commutative) | assignment: time complexity becomes: N₁ * log(log(N₂))
Space complexity remains O(1)

Please check my math. 🙂

Jan 23 '22 03:01 saik0

I think it 1 works for commutative & assignment ops, but only when both are owned

Jan 23 '22 04:01 saik0

For commutative & (!assignment) ops: We can flip B₁ for B₂ when N₁ > N₂.

In other words: Whenever it is possible to flip the order: let the log search be over the larger collection

Isn't it what we are already doing? Swapping when lhs is greater than rhs.

Given that containers are sorted. While iterating over containers in B₁, searching for containers in B₂:

If a container is found at iteration I: The idx at iteration I+1 will be strictly greater than idx at I

If a container is not found at iteration I: The idx at iteration I+1 will be greater than or equal to the insertion point at I

In either case, we can logarithmically reduce the search space for every subsequent iteration, so it becomes log(log(N))

But you are right we are not reducing the scope to the subset of interesting containers to search in. That's something I thought about but didn't change because benchmarks were satisfying IIRC.

For commutative & !assignment ops time complexity becomes: min(N₁, N₂) * log(log(max(N₁, N₂)))

For ops that are (!commutative) | assignment: time complexity becomes: N₁ * log(log(N₂))

Space complexity remains O(1)

Your math look right indeed 💯

Jan 23 '22 15:01 Kerollmops

Once I finish galloping and simd I'll look at the call stack and evaluate if we spend enough cpu time searching for it to even matter.

Jan 23 '22 16:01 saik0

roaring-rs roaring-rs copied to clipboard

Optimization: Reduce container search time complexity

roaring-rs
roaring-rs copied to clipboard