Daniel Lemire
Daniel Lemire
@sarahgerweck I would argue that a better measure is the number of bytes processed per CPU cycles (or number of bytes processed per CPU cycles)... E.g., see https://arxiv.org/pdf/1609.09840 https://arxiv.org/pdf/1503.03465 ......
When MurmurHash is used as a deterministic function (without randomization), then the answer is that you can find two keys that always collide. With 100% probability. How do I know...
I don't know what the question was... so I am speculating... hoping that it will help clear up the question... * Given two text messages, they either always collide, or...
What is wrong with just using the 64 bits you need?
> What mathematical proof do I have that these 64-bits are as much collision-proof as a specifically 64-bits MurmurHash3-type algorithm could provide? If you can find a 64-bit hash function...
For context, this remark in the README was written 6 years ago.
Do you get good results with gzip? I have no experience compressing roaring bitmaps with generic codecs... assuredly, the impact will be data specific... Related: Compressing JSON: gzip vs zstd...
Interestingly, it looks like lz4 makes things worse in your test!
I recommend referring back to the Java approach which has been emulated in C and Go... https://github.com/RoaringBitmap/RoaringBitmap/blob/master/RoaringBitmap/src/main/java/org/roaringbitmap/FastAggregation.java#L433 The code is like so... ```Java RoaringBitmap answer = new RoaringBitmap(); for (int...
There is a 'frozen format' supported by the C and Go versions. It is like the normal serialized format, but it adds some alignment so that you can just reinterpret...