roaring-rs icon indicating copy to clipboard operation
roaring-rs copied to clipboard

Implement Hash

Open michaelmior opened this issue 3 years ago • 6 comments

Would it be possible to implement the Hash trait for RoaringBitmap? It looks like this can be derived automatically (via ArrayStore/BitmapStore, Store, and Container). Alternatively, perhaps something like the hash function in pyroaring would be more suitable?

michaelmior avatar Aug 03 '22 14:08 michaelmior

I tried benchmarking this vs BTreeHash which is the closest thing I'm aware of in the standard library. Hashing roaring bitmaps is orders of magnitude faster when simply using the derived implementation. My branch is here.

michaelmior avatar Aug 03 '22 15:08 michaelmior

I don't think it can be derived automatically - identical RoaringBitmaps with different representations (array vs bitmap) should have the same hash.

RaduBerinde avatar Aug 13 '22 15:08 RaduBerinde

@RaduBerinde Good point! I somehow hadn't thought of that. It happens to be working as far as I can tell, although that could be just by chance.

michaelmior avatar Aug 13 '22 21:08 michaelmior

It's possible there is currently only one possible choice, depending on the cardinality (we call "ensure correct store" after each op). But maybe that will change in the future (I actually found this surprising, as it invites some annoying cornercases, like repeatedly adding and removing an element when cardinality is at the cutoff).

RaduBerinde avatar Aug 14 '22 21:08 RaduBerinde

One simple way for a good enough impl Hash for RoaringBitmap could be to iter on the numbers. The impl Hash for Treemap could simply be a basic derive as the contained types implement Hash (a BTreeMap of RoaringBitmaps).

I understand the performance concern but I just wanted to notice that. Thank you very much for the time you take to help on this project!

we call "ensure correct store" after each op

Not sure about the performance, also it can only be called, as you said, after an op, as the bitmap must be mutable.

Kerollmops avatar Aug 15 '22 11:08 Kerollmops

My $0.02: Java, C and Go allow two equivalent bitmaps to have mismatched hashes if one contains run containers and the other doesn't, so from a consistency and speed (now you can operate on bytes plain and simple!) point of view it's still an option here.

Oppen avatar Nov 10 '22 22:11 Oppen