roaring-rs
                                
                                 roaring-rs copied to clipboard
                                
                                    roaring-rs copied to clipboard
                            
                            
                            
                        Implement Hash
Would it be possible to implement the Hash trait for RoaringBitmap? It looks like this can be derived automatically (via ArrayStore/BitmapStore, Store, and Container). Alternatively, perhaps something like the hash function in pyroaring would be more suitable?
I tried benchmarking this vs BTreeHash which is the closest thing I'm aware of in the standard library. Hashing roaring bitmaps is orders of magnitude faster when simply using the derived implementation. My branch is here.
I don't think it can be derived automatically - identical RoaringBitmaps with different representations (array vs bitmap) should have the same hash.
@RaduBerinde Good point! I somehow hadn't thought of that. It happens to be working as far as I can tell, although that could be just by chance.
It's possible there is currently only one possible choice, depending on the cardinality (we call "ensure correct store" after each op). But maybe that will change in the future (I actually found this surprising, as it invites some annoying cornercases, like repeatedly adding and removing an element when cardinality is at the cutoff).
One simple way for a good enough impl Hash for RoaringBitmap could be to iter on the numbers. The impl Hash for Treemap could simply be a basic derive as the contained types implement Hash (a BTreeMap of RoaringBitmaps).
I understand the performance concern but I just wanted to notice that. Thank you very much for the time you take to help on this project!
we call "ensure correct store" after each op
Not sure about the performance, also it can only be called, as you said, after an op, as the bitmap must be mutable.
My $0.02: Java, C and Go allow two equivalent bitmaps to have mismatched hashes if one contains run containers and the other doesn't, so from a consistency and speed (now you can operate on bytes plain and simple!) point of view it's still an option here.