roaring-rs icon indicating copy to clipboard operation
roaring-rs copied to clipboard

len should take constant-time

Open Nemo157 opened this issue 7 years ago • 4 comments

https://github.com/rust-lang-nursery/api-guidelines/issues/149

Nemo157 avatar Jan 17 '18 20:01 Nemo157

Contant time len() will probably require additional storage. Atleast for my usecase that would be quite unfortunate as my program has hundreds of thousands of bitmaps at any time.

Perhaps instead the name of len() could be changed to cardinality() or another name so it doesn't conflict with the API guidelines and doesn't bloat RoaringBitmap with extra bytes?

josephglanville avatar Mar 23 '20 19:03 josephglanville

#127 removes 8 bytes from each array container, so maybe we can afford a few extra bytes at the root now ;)

saik0 avatar Jan 10 '22 03:01 saik0

This is relevant to #167.

Cardinality of all ops can all be trivially implemented in terms of the cardinality of the intersection |A ∩ B|.

|A \ B| = |A| − |A ∩ B|
|A ∪ B| = |A| + |B| − |A ∩ B|
|A △ B| = |A| + |B| − 2|A ∩ B|

Intersection cardinality is cheaper to compute, so it's both simpler to implement and has better runtime performance if we have the len cached.

saik0 avatar Feb 06 '22 23:02 saik0

Crazy, terrible, stupid idea to avoid increasing the size of RoaringBitmap but still store the cardinality:

The max cardinality of the bitmap is 0x1_0000_0000 (33 bits). but the len and capacity of the root vec should only ever reach 0x1_0000 (17 bits). On 64 bit systems, that gives us plenty of space to store the cardinality in the top bits. On 32 bit systems, that gives us 30 bits (15 bits for capacity/len each). Additionally, because BitmapStorecontains a u64, the alignment ofContainer` is at least 8, so we could store the remaining 3 bits of cardinality in the low bits of the vec data pointer.

Dr-Emann avatar Mar 21 '22 16:03 Dr-Emann