Daniel Lemire

Results 1864 comments of Daniel Lemire

@sarahgerweck I would argue that a better measure is the number of bytes processed per CPU cycles (or number of bytes processed per CPU cycles)... E.g., see https://arxiv.org/pdf/1609.09840 https://arxiv.org/pdf/1503.03465 ......

When MurmurHash is used as a deterministic function (without randomization), then the answer is that you can find two keys that always collide. With 100% probability. How do I know...

I don't know what the question was... so I am speculating... hoping that it will help clear up the question... * Given two text messages, they either always collide, or...

What is wrong with just using the 64 bits you need?

> What mathematical proof do I have that these 64-bits are as much collision-proof as a specifically 64-bits MurmurHash3-type algorithm could provide? If you can find a 64-bit hash function...

For context, this remark in the README was written 6 years ago.

Do you get good results with gzip? I have no experience compressing roaring bitmaps with generic codecs... assuredly, the impact will be data specific... Related: Compressing JSON: gzip vs zstd...

Interestingly, it looks like lz4 makes things worse in your test!

I recommend referring back to the Java approach which has been emulated in C and Go... https://github.com/RoaringBitmap/RoaringBitmap/blob/master/RoaringBitmap/src/main/java/org/roaringbitmap/FastAggregation.java#L433 The code is like so... ```Java RoaringBitmap answer = new RoaringBitmap(); for (int...

There is a 'frozen format' supported by the C and Go versions. It is like the normal serialized format, but it adds some alignment so that you can just reinterpret...