CRoaring
CRoaring copied to clipboard
Add roaring_bitmap_add_bulk
roaring_bitmap_add_bulk
is a generalization of roaring_bitmap_add_many
, caching the container for the last inserted item, and avoiding looking the container up if another item is inserted in the same container.
Use the new function in the implementation of roaring_bitmap_add_many
and roaring_bitmap_of
This looks good. Could you contribute a benchmark and report on your results?
Please also review the code done at a similar PR we are currently reviewing... https://github.com/RoaringBitmap/CRoaring/pull/362
Very nice work! I left a few comments, and as @lemire mentioned we would need some benchmarks to measure the effects.
I ran the benchmarks on an M1 Pro, averaged over 3 runs each.
add_many benchmark | Old | New | New (add_bulk |
---|---|---|---|
intvlen=1 density=0.200000 order=SHUFFLE | 43.2 | 42.9 | 42.0 |
intvlen=4 density=0.200000 order=SHUFFLE | 15.2 | 14.9 | 14.7 |
intvlen=16 density=0.200000 order=SHUFFLE | 8.0 | 8.0 | 8.8 |
intvlen=64 density=0.200000 order=SHUFFLE | 6.8 | 6.8 | 7.7 |
intvlen=1 density=0.200000 order=ASC | 6.7 | 6.8 | 4.2 |
intvlen=4 density=0.200000 order=ASC | 3.8 | 3.9 | 3.6 |
intvlen=16 density=0.200000 order=ASC | 2.9 | 2.9 | 3.8 |
intvlen=64 density=0.200000 order=ASC | 2.7 | 2.7 | 3.6 |
intvlen=1 density=0.200000 order=DESC | 26.3 | 25.3 | 21.2 |
intvlen=4 density=0.200000 order=DESC | 11.9 | 12.0 | 11.9 |
intvlen=16 density=0.200000 order=DESC | 9.0 | 8.9 | 10.2 |
intvlen=64 density=0.200000 order=DESC | 9.3 | 9.4 | 10.3 |
Ok. The benchmarks indicate some gains, some of the time. That's good.
@Dr-Emann Would you check out the draft PR https://github.com/RoaringBitmap/CRoaring/pull/371 ?
We would like to unify these features so that it looks more pleasant to the users, and also we would like to use similar conventions to improve maintenance.
Benchmarks for contains_bulk
vs contains_multi
(contains_bulk uses a loop over the more general roaring_bitmap_contains_bulk
, contains_multi is calling roaring_bitmap_contains_multi
from #371, which works only with arrays).
I think the performance difference is small enough that I would recommend keeping only roaring_bitmap_contains_bulk
, since it's more general.
roaring_bitmap_contains: 16.935610 17.271300 23.584906 17.378274 16.928903 16.930263 17.124931 17.134777 17.085276 17.110626
roaring_bitmap_contains_multi: 13.814899 13.872637 19.660377 13.902037 13.641540 13.806249 13.935047 13.803897 13.677905 13.276861
roaring_bitmap_contains_bulk: 13.902713 13.933411 16.509434 13.942774 15.148100 14.404645 14.579878 14.357751 13.852498 14.135060
roaring_bitmap_contains with sorted input: 4.197930 4.076912 69.188679 4.445199 4.136423 3.986609 3.973190 3.941677 3.991282 4.075313
roaring_bitmap_contains_multi with sorted input: 1.240232 1.163129 7.075472 1.414161 1.147784 1.153642 1.209232 1.138563 1.090571 1.092830
roaring_bitmap_contains_bulk with sorted input: 1.426006 1.453911 7.075472 1.697381 1.406970 1.420494 1.474295 1.400675 1.401628 1.398334
I think the performance difference is small enough that I would recommend keeping only roaring_bitmap_contains_bulk, since it's more general.
Sounds reasonable. We do not want to extend the API needlessly.
It looks like vs16-ci is building the tests with NDEBUG
set, which disables asserts, and my test currently checks for side effects from the argument passed to assert. It feels like setting NDEBUG for tests is counterproductive?
It looks like vs16-ci is building the tests with NDEBUG set, which disables asserts, and my test currently checks for side effects from the argument passed to assert. It feels like setting NDEBUG for tests is counterproductive?
We definitively want the tests to build and run successfully irrespective of whether NDEBUG is set. You can prescribe to the users that they build the library without NDEBUG.
The Linux builds, by default, are in release mode and release mode sets NDEBUG. We definitively want the tests to run in that case.
It looks like we re-assign CMAKE_C_FLAGS_RELEASE
for everything but MSVC to not include -DNDEBUG
.
I agree that the library should always work the same regardless of the NDEBUG
define, but I think we should consider forcibly undefine-ing it for tests: we use bare assert
quite a bit in the tests, and their purpose is to test things: it's unfortunate that the tests check quite a bit less on MSVC in release mode.
That said, that doesn't have to be in scope for this PR. I've changed the asserts in the bulk test to assert_true
which is unconditionally run.
As a matter of style, I don't think that tests should rely on assert
s. If we are currently, then it is a mistake.
Note that I do not mean that we should not use asserts in tests, only that we should not judge the result of the test solely on whether an assert fired or not.
There may be exceptions.
Please sync with our main branch (master).
Thanks for syncing.
@Dr-Emann Do you recommend we merge this PR?
Yeah, I think it's worthwhile, it seems to almost always be at least a small win over non-bulk/array operations, and just a tiny bit slower than array operations, with more flexibility. I'm excited to use it for e.g. the rust FromIterator
implementation in croaring-rs.
Merging.