CRoaring icon indicating copy to clipboard operation
CRoaring copied to clipboard

C++: Make Roaring a hashable class

Open sor16 opened this issue 5 years ago • 5 comments

Hi,

thank you for this great piece of software. It is speeding up my c++ program a lot! One thing I stumbled upon while using the Roaring class in the context of a hash table, using the class as a key in a std::unordered_map, is that it does not have a defined hash function.

In my case I use Roaring to store at which indices a certain condition is met. The reason I think it is useful to use the class directly as key is that it prevents me from having to extract the indices from a Roaring object and store them as a comma-separated string to use as a key.

Any plans on implementing a hash function for the Roaring class?

Best wishes, Sölvi

sor16 avatar Feb 17 '20 17:02 sor16

This is reasonable and the Java version supports hashing.

You have to decide a few things. For example, do you want the hash value to be based on the logical content of the bitmap? If so, you have to limit the hash function so that it is the same regardless of the physical layout (e.g., with/without run contains, with/without copy-on-write).

I think that the running time of the hash function in Java depends on the cardinality of the bitmap. Basically, you need to iterate through all of the content to compute the hash value. This may or may not make it a good candidate for your application.

The Go version has an outstanding request for this feature: https://github.com/RoaringBitmap/roaring/issues/198

So, I would say that it is quite reasonable and should be implemented.

Pull request invited. I do not think that it is very hard to implement and I am available to provide guidance if someone wants to implement it.

lemire avatar Feb 17 '20 19:02 lemire

What happens if I modify a roaring bitmap that's used as a key for a map? It that UB? Should we document to always make them COW or some kind of read-only flag if that use is expected? Or does it copy the whole thing?

Oppen avatar Jun 16 '21 13:06 Oppen

What happens if I modify a roaring bitmap that's used as a key for a map? It that UB?

Note that in something like std::unordered_map, the key is always a constant and consider immutable. Of course, you can implement your how hash table where keys are mutable... but then you should define the logic.

About hashing...

The usual convention goes as follows. You have that if x is equal to y, meaning that it contains the same values, then they hash to the same hash value. If x differs from y, then the hash value may differ. If you modify a key in a hash table without doing a reinsertion, you get a broken hash table... which is why std:: unordered_map won't let you modify keys.

Of course, there are other ways to skin this cat. Maybe you do not care about the content, but only on the identity. In this case, the key might be a pointer address (and not the bitmap at that address). If so then you do not need any help from the library. You can take the pointer address and hash that. After all, a pointer address is just an integer type (ultimately).

lemire avatar Jun 16 '21 13:06 lemire

My question comes from having aliasing. Let's say we use a RoaringBitmap instance (I think that's the name in C++?) as key. Will RoaringBitmap be deep-copied? In that case, should we make it COW so we don't have to recreate all the containers in-place?

Oppen avatar Jun 16 '21 13:06 Oppen

@Oppen It will definitively be copied if you use values as keys (of course, you can use pointers too... that would be a different story). Whether we have COW or not depends on the user. You can set your bitmaps to have COW or not.

lemire avatar Jun 16 '21 13:06 lemire