roaring icon indicating copy to clipboard operation
roaring copied to clipboard

(code might give the impression of possible) integer overflows in binary search

Open glycerine opened this issue 8 years ago • 8 comments

Both the C and the java binary search implementations are subject to a classic integer overflow bug in binary search; e.g. at https://github.com/RoaringBitmap/CRoaring/blob/master/include/roaring/containers/run.h#L91 instead of int32_t middleIndex = (low + high) >> 1; prefer: int32_t middleIndex = low + ((high-low) >> 1);

glycerine avatar Nov 26 '16 15:11 glycerine

@glycerine

I think that's only an issue if low + high can overflow. I don't think that's possible in Roaring in general. Indeed, they are always values in [0,1<<16] so they are nearly 16-bit integers. They can't even come close to overflowing.

It might be worthwhile, however, to document the fact that they can't overflow.

lemire avatar Nov 28 '16 23:11 lemire

... that would be true if the computation on uint32 was was done in 64-bit space, but it's done in uint32 land here https://github.com/RoaringBitmap/CRoaring/blob/master/include/roaring/containers/run.h#L91, so this code is definitely incorrect/vulnerable to overflow.

An easy small series of one line changes to make.

glycerine avatar Dec 17 '16 03:12 glycerine

ah I see, because even on the key search, it's only every a 16-bit int. Okay I'm less concerned. But still worth fixing for the educational value that it brings to readers of the code.

glycerine avatar Dec 17 '16 03:12 glycerine

@glycerine

An easy small series of one line changes to make. int32_t middleIndex = (low + high) >> 1; int32_t middleIndex = low + ((high-low) >> 1);

Though I expect it does not impact real-world performance, one is trading 2 operations for 3 operations: the "safer" code might very well add one cycle of latency (a 50% increase).

That's why I favor documenting the safety of the existing code.

lemire avatar Dec 19 '16 17:12 lemire

That's reasonable. Seeing as this is a subtle point about binary search (which is notoriously subtle; Knuth says it took 30 years to get a close-to-correct published version of binary search) that impacted both the Go stdlib and the Java stdlib implementations originally (and perhaps still in Java; I don't know if they ever fixed it), I think it worth providing a detailed comment; something like

// optimization note -- this *looks* like a classic binary search implementation bug, but it's not.
// middleIndex = low + ((high-low)/2) would typically be preferred to avoid overflow.
// However, in this context, high and low are always <= 2^16-1, so there can 
// be no overflow and we save one integer subtraction.

glycerine avatar Dec 19 '16 19:12 glycerine

We are in agreement.

lemire avatar Dec 19 '16 19:12 lemire

looks like it was fixed in the jdk:

// https://research.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html

glycerine avatar Dec 19 '16 19:12 glycerine

@glycerine Yeah. I expected as much.

lemire avatar Dec 19 '16 20:12 lemire