roaring-rs icon indicating copy to clipboard operation
roaring-rs copied to clipboard

Optimization: Reduce container search time complexity

Open saik0 opened this issue 3 years ago • 3 comments

let B1, B2 = some roaring bitmaps let N1 = number of containers in B1 let N2 = number of containers in B2

Currently binary operations such as B1 | B2 have an N1 * log(N2) container lookups.

  1. For commutative & (!assignment) ops: We can flip B1 for B2 when N1 > N2.
    • In other words: Whenever it is possible to flip the order: let the log search be over the larger collection
  2. Given that containers are sorted. While iterating over containers in B1, searching for containers in B2:
    • If a container is found at iteration I: The idx at iteration I+1 will be strictly greater than idx at I
    • If a container is not found at iteration I: The idx at iteration I+1 will be greater than or equal to the insertion point at I
    • In either case, we can logarithmically reduce the search space for every subsequent iteration, so it becomes log(log(N))
  • For commutative & !assignment ops time complexity becomes: min(N1, N2) * log(log(max(N1, N2)))
  • For ops that are (!commutative) | assignment: time complexity becomes: N1 * log(log(N2))
  • Space complexity remains O(1)

Please check my math. 🙂

saik0 avatar Jan 23 '22 03:01 saik0

I think it 1 works for commutative & assignment ops, but only when both are owned

saik0 avatar Jan 23 '22 04:01 saik0

  1. For commutative & (!assignment) ops: We can flip B1 for B2 when N1 > N2.
  • In other words: Whenever it is possible to flip the order: let the log search be over the larger collection

Isn't it what we are already doing? Swapping when lhs is greater than rhs.

  1. Given that containers are sorted. While iterating over containers in B1, searching for containers in B2:
  • If a container is found at iteration I: The idx at iteration I+1 will be strictly greater than idx at I
  • If a container is not found at iteration I: The idx at iteration I+1 will be greater than or equal to the insertion point at I
  • In either case, we can logarithmically reduce the search space for every subsequent iteration, so it becomes log(log(N))

But you are right we are not reducing the scope to the subset of interesting containers to search in. That's something I thought about but didn't change because benchmarks were satisfying IIRC.

  • For commutative & !assignment ops time complexity becomes: min(N1, N2) * log(log(max(N1, N2)))
  • For ops that are (!commutative) | assignment: time complexity becomes: N1 * log(log(N2))
  • Space complexity remains O(1)

Your math look right indeed 💯

Kerollmops avatar Jan 23 '22 15:01 Kerollmops

Once I finish galloping and simd I'll look at the call stack and evaluate if we spend enough cpu time searching for it to even matter.

saik0 avatar Jan 23 '22 16:01 saik0