CRoaring Improve efficiency of outer map operations for Roaring64Map

Hello,

The purpose of this PR is to slightly improve the efficiency of Roaring64Map in C++.

As you know Roaring64Map contains an "outer" std::map, which maps the key's high bits to individual (32-bit) Roaring maps, which in turn handle the key's low bits. The problem is that many of the methods in Roaring64Map probe the outer map more times than necessary. This includes high-traffic items like add() and contains(), which probe twice when they only need to probe once.

I've changed a variety of methods to do fewer probes. While I was here I also took the liberty to make a few changes:

In flip(), I added what I believe to be a missing call to setCopyOnWrite()
In operator &=, operator -=, and operator ^=, if the 32-bit Roaring entry becomes empty, I remove it from the outer map altogether, rather than leaving an empty Roaring in that slot. I believe this is preferable (?)
In shrinkToFit() I removed an instance of what I believe is undefined behavior. In the line roarings.erase(iter++) I believe it is theoretically possible for the increment to happen after the call to erase has finished (and therefore iter would be an invalid iterator at that point and you ought not try to increment it). EDIT: no, this can't happen with an overloaded operator, so, forget I mentioned it. (I would still prefer to keep my change because I feel it reads better).
In maximum() I changed the code to use forward iterators, but traversing in the reverse direction, rather than reverse iterators. The rationale is that map's reverse_iterator is significantly slower than its forward_iterator, because in reverse_iterator, the operator * and operator -> methods are not constant time because they both do a decrement operation on a temporary internal tree iterator. I can provide more detail if you're interested.

The code passes the current test suite. I'm not sure if that's sufficient or if additional tests need to be written.

Sep 22 '22 00:09 kosak

@SLieve Would you review?

Sep 22 '22 00:09 lemire

Sure!

Sep 22 '22 07:09 SLieve

Looking over the code again, I need to retract what I said about undefined behavior and iter++. The issue I mentioned isn't relevant with an overloaded operator. Nevertheless, I still like my change to the code, as I think it reads better.

Sep 22 '22 07:09 kosak

Thanks for the replies! I'll get back to you in a little while with updates and unit tests.

Sep 22 '22 20:09 kosak

Hi,

I've done a pretty deep dive on this code. The more I looked at it, the more stuff I felt I wanted to change.

I realize this may be more than you want, and you may decide you want none, some, or all of these changes. Or you might want this large pull request broken up into smaller pieces, or you may want more unit tests.

In any case I thought I would show you what I have, and you can decide how you would like to proceed.

Here is a list of what I changed:

various documentation and grammar nits
Renaming variables/arguments as well as adding temporary variables for readability
Unit tests: fixed some compiler warnings, changed some assert() to assert_true()
Additional per-function behavioral changes since last time

The last item involved a lot more radical changes so I will go into more detail here to explain my rationale on a method-by-method basis:

bitmapOf(std::initializer_list<uint64_t> list): a safe, more idiomatic way of passing a list of items that is less error-prone than C-style varargs

More modifications to: operator&=, |=, ^=, -= : (simple) logic to do the right thing when the left and right hand sides are the same object. as in mymap -= mymap

Added cardinality_nothrow(): The existing cardinality() has a caveat that warns that it might throw an exception if the map is completely full, and that users who care should call isFull() first. I'm not sure I really believe in this scenario (2^64 is a lot of bits), but the user who finds themselves in that scenario will first have to call a (very slow) isFull() followed by a (very slow) cardinality(), which seems a little user-unfriendly. I felt it would be more user-friendly to provide a cardinality_nothrow() so that users can get their information in one call rather than two. As a happy side effect, both cardinality() and isFull() can now simply delegate to cardinality_nothrow() so the net amount of code is smaller.

I also find myself wishing that cardinality() was constant time rather than (painful to calculate) but I assume there are reasons it is the way it is.

isStrictSubset(): The existing isStrictSubset() warns that it might throw if the bitset is completely full, and like the above, users who might be in this scenario need to call isFull() first. This felt a little user-unfriendly to me, because it leaks an implementation detail that the user shouldn't have to care about in the first place. What I did was make a helper method bool isSubset(const Roaring64Map &other, bool requireStrict), that both isSubset and isStrictSubset can delegate to. This new method scans the map and determines strictness along the way. This new helper method is public but doesn't need to be. And it doesn't throw an exception if the bitmap is totally full.

toUInt64Array: overuse of functional style obfuscates a simple method (in my opinion!)

flip: A little background here. The standard style for other methods (addRange vs addRangeClosed, removeRange vs removeRangeClosed) is to provide a "half-open interval" version and a "closed interval" version, and the former would simply adjust the coordinates and call the latter. It appears in this case that the underlying 32-bit bitmap only has a "half-open range" version of flip(). So the 64 bit version was written to call that, but still uses the same coding recipe that the other closed versions do. I suspect this might cause some bugs to creep in but I'm not sure this actually caused any bugs. However I can point to one place where it comes to a wrong conclusion. As a counterexample, if you pass it the half-open interval [0xFFFFFFFF, 0x100000000), you would expect it to conclude that that interval falls on a single underlying bitmap and invoke that clause in the code. However, it does not come to that conclusion, because highBytes(0xFFFFFFFF) != highBytes(0x100000000) and so it fails its (incorrect) same-underlying test. Again I don't think that breaks anything but it's a sign that the code is a little incoherent. I felt it was better to provide two versions of flip(): flip() and flipClosed(). This allows the code to follow the same tried-and-true recipe as already established for addRange()/addRangeClosed() and removeRange()/removeRangeClosed(). Now if we were strictly following the recipe, we would have the problem that flipClosed() needs to call the corresponding 32-bit flipClosed (which doesn't exist for some reason). But that's fine, we can work around it with another coordinate adjustment and be on our way. A perhaps better alternative would be for the 32-bit code to actually provide both flip() and flipClosed(), but that's your decision. In any case the code follows the standard recipe now and is therefore hopefully less likely to be buggy.

select: Forcibly casts a uint64 pointer to a uint32 pointer, which is likely undefined behavior, and comes with the warning // assuming little endian, which I fear means that big-endian customers get corrupt data. Anyway, there's no need for this. A small rewrite to this function eliminates both issues.

printf and toString: Another overuse of the functional style makes what should be a simple method virtually impossible to read (in my opinion!). Instead I provide a standard C++ operator for streaming to an ostream, which is more idiomatic C++. And now, printf and toString can simply delegate to that.

fastunion: So, operator|= has a comment See also the fastunion function to aggregate many bitmaps more quickly. but then fastunion has the disclaimer // not particularly fast and it turns out that it just iterates over the sets and unions them one-by-one. So I rewrote fastunion to basically do a "group by" operation, grouping the sets by their highbits key, and then calling out to roaring_bitmap_or_many to combine each group.

UNIT TESTS: I added unit tests for operator&=, |=, -=, and ^=. Basically I generate two large sets, populate them sort-of randomly and then perform the operations on them. I couldn't really decide whether this should be in cpp_unit.cpp or cpp_random_unit.cpp, so I put them in the former for now. Also, while it was tempting to reuse the make_random_bitset64 function, I felt I wanted more like a million values, not 100, and I wanted to distribute them over the two sets in an "interesting" way. Probably a little more work is needed here but I wanted your feedback first.

Also cleared out a few warnings. Also there are a bunch of assert calls in toplevel_unit.c which I'm pretty sure were meant to be assert_true(), since "assert" is a no-op in optimized builds and I assume you wanted these unit test assertions evaluated regardless of optimization setting.

This PR passes all existing unit tests.

Again, kindly let me know if you want less, more, split into multiple PRs, more unit tests, etc.

Sep 25 '22 02:09 kosak

To make the non-functional changes easier to review, I suggest moving them to a separate PR. Of course the line is a bit blurry here, but I'll leave it up to you to make the distinction.

Sep 25 '22 09:09 SLieve

Please do not add a dependency on the C++ stream library into CRoaring. If you want something like that at least move it to a separate/separable header/source so that it can be omitted on systems where stream is not available/wanted/present.

Sep 25 '22 12:09 madscientist

I have no problem with select being rewritten but note that it is not undefined behavior to cast an unsigned 64bit value to 32bits: the standard defines this (to truncate the value).

Sep 25 '22 12:09 madscientist

I have no problem with select being rewritten but note that it is not undefined behavior to cast an unsigned 64bit value to 32bits: the standard defines this (to truncate the value).

Of course. But the code In question wasn't casting a value, it was casting a pointer (from uint64_t* to uint32_t*):

// assuming little endian
return map_entry.second.select((uint32_t)rnk, ((uint32_t *)element));

Sep 25 '22 13:09 kosak

@SLieve OK. I could go a little more fine-grain if you want. The changes are pretty independent. I could do separate PR's for

&=, ^=, |=, -= (they all look the same)
improving the print routines (without involving the C++ stream library, per @madscientist )
comment-only changes
cleanups in existing unit tests (like the change assert -> assert_true and uninitialized variable warnings)
and then maybe one PR per major functional change, like ** a PR for fastunion ** a PR for cardinality_nothrow ** a PR for flip

Or is that too many PRs?

Just let me know. I'll get back to this in a couple of days.

Sep 25 '22 13:09 kosak

@kosak that works for me!

Sep 25 '22 13:09 SLieve

This is very good work and we want to merge this. Unfortunately, it seems we got off-track.

@kosak Comments?

Nov 04 '22 14:11 lemire

Hi, thanks for your kind words and I'm sorry for the delay. I said I would come back to this in "a few days" but then life got in the way :-). Anyway, as promised I have broken my change up into smaller pieces to make it easier to review. I have recycled this PR to focus on just this first step. (I could also move this to a new PR if you want).

Here is the first piece, which deals only with the overloaded set operations (&=,|=,-=,^=) In addition to the usual things, I would ask the reviewers to also consider the following issues:

I am adding an "auto-cleanup" behavior that wasn't there before. What this means is, if the code detects that it has resulted in an empty inner Roaring, it will delete it from the outer C++ map rather than leaving an empty "placeholder" in that outer slot. Personally I think it's preferable to do it this way.
I added tests for these operators but couldn't decide where they belong (cpp_unit vs cpp_random_unit). cpp_unit seems to be for the "main" tests and cpp_random_unit is for additional random challenges. So I think these tests should be in cpp_unit even though they do use a random strategy. Furthermore I used the C++ random number generator rather than rand() because I wanted a local decent-quality random number generator with deterministic behavior. Also note that these tests don't need to thoroughly test the underlying 32-bit Roaring (that is done elsewhere), but rather they need to focus on the outer 64-bit bucket management.
I also added lots of comments and used the terminology "XOR" rather than "symmetric union" which I think is more common.

Nov 04 '22 19:11 kosak

Great, thanks for splitting it up! I'll try to review it soon.

Nov 04 '22 20:11 SLieve

Running tests. If they successful, we might merge.

Nov 06 '22 19:11 lemire

Merged.

Nov 07 '22 19:11 lemire

CRoaring CRoaring copied to clipboard

Improve efficiency of outer map operations for Roaring64Map

CRoaring
CRoaring copied to clipboard