roaring
roaring copied to clipboard
(code might give the impression of possible) integer overflows in binary search
Both the C and the java binary search implementations are subject to a classic integer overflow bug in binary search; e.g. at https://github.com/RoaringBitmap/CRoaring/blob/master/include/roaring/containers/run.h#L91 instead of int32_t middleIndex = (low + high) >> 1; prefer: int32_t middleIndex = low + ((high-low) >> 1);
@glycerine
I think that's only an issue if low + high
can overflow. I don't think that's possible in Roaring in general. Indeed, they are always values in [0,1<<16]
so they are nearly 16-bit integers. They can't even come close to overflowing.
It might be worthwhile, however, to document the fact that they can't overflow.
... that would be true if the computation on uint32 was was done in 64-bit space, but it's done in uint32 land here https://github.com/RoaringBitmap/CRoaring/blob/master/include/roaring/containers/run.h#L91, so this code is definitely incorrect/vulnerable to overflow.
An easy small series of one line changes to make.
ah I see, because even on the key search, it's only every a 16-bit int. Okay I'm less concerned. But still worth fixing for the educational value that it brings to readers of the code.
@glycerine
An easy small series of one line changes to make. int32_t middleIndex = (low + high) >> 1; int32_t middleIndex = low + ((high-low) >> 1);
Though I expect it does not impact real-world performance, one is trading 2 operations for 3 operations: the "safer" code might very well add one cycle of latency (a 50% increase).
That's why I favor documenting the safety of the existing code.
That's reasonable. Seeing as this is a subtle point about binary search (which is notoriously subtle; Knuth says it took 30 years to get a close-to-correct published version of binary search) that impacted both the Go stdlib and the Java stdlib implementations originally (and perhaps still in Java; I don't know if they ever fixed it), I think it worth providing a detailed comment; something like
// optimization note -- this *looks* like a classic binary search implementation bug, but it's not.
// middleIndex = low + ((high-low)/2) would typically be preferred to avoid overflow.
// However, in this context, high and low are always <= 2^16-1, so there can
// be no overflow and we save one integer subtraction.
We are in agreement.
looks like it was fixed in the jdk:
// https://research.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
@glycerine Yeah. I expected as much.