CRoaring icon indicating copy to clipboard operation
CRoaring copied to clipboard

Add roaring_bitmap_add_bulk

Open Dr-Emann opened this issue 2 years ago • 4 comments

roaring_bitmap_add_bulk is a generalization of roaring_bitmap_add_many, caching the container for the last inserted item, and avoiding looking the container up if another item is inserted in the same container.

Use the new function in the implementation of roaring_bitmap_add_many and roaring_bitmap_of

Dr-Emann avatar May 08 '22 02:05 Dr-Emann

This looks good. Could you contribute a benchmark and report on your results?

Please also review the code done at a similar PR we are currently reviewing... https://github.com/RoaringBitmap/CRoaring/pull/362

lemire avatar May 13 '22 18:05 lemire

Very nice work! I left a few comments, and as @lemire mentioned we would need some benchmarks to measure the effects.

Oppen avatar May 15 '22 02:05 Oppen

I ran the benchmarks on an M1 Pro, averaged over 3 runs each.

add_many benchmark Old New New (add_bulk
intvlen=1 density=0.200000 order=SHUFFLE 43.2 42.9 42.0
intvlen=4 density=0.200000 order=SHUFFLE 15.2 14.9 14.7
intvlen=16 density=0.200000 order=SHUFFLE 8.0 8.0 8.8
intvlen=64 density=0.200000 order=SHUFFLE 6.8 6.8 7.7
intvlen=1 density=0.200000 order=ASC 6.7 6.8 4.2
intvlen=4 density=0.200000 order=ASC 3.8 3.9 3.6
intvlen=16 density=0.200000 order=ASC 2.9 2.9 3.8
intvlen=64 density=0.200000 order=ASC 2.7 2.7 3.6
intvlen=1 density=0.200000 order=DESC 26.3 25.3 21.2
intvlen=4 density=0.200000 order=DESC 11.9 12.0 11.9
intvlen=16 density=0.200000 order=DESC 9.0 8.9 10.2
intvlen=64 density=0.200000 order=DESC 9.3 9.4 10.3

Dr-Emann avatar May 22 '22 04:05 Dr-Emann

Ok. The benchmarks indicate some gains, some of the time. That's good.

lemire avatar May 23 '22 16:05 lemire

@Dr-Emann Would you check out the draft PR https://github.com/RoaringBitmap/CRoaring/pull/371 ?

We would like to unify these features so that it looks more pleasant to the users, and also we would like to use similar conventions to improve maintenance.

lemire avatar Aug 15 '22 14:08 lemire

Benchmarks for contains_bulk vs contains_multi (contains_bulk uses a loop over the more general roaring_bitmap_contains_bulk, contains_multi is calling roaring_bitmap_contains_multi from #371, which works only with arrays).

I think the performance difference is small enough that I would recommend keeping only roaring_bitmap_contains_bulk, since it's more general.

                          roaring_bitmap_contains:  16.935610  17.271300  23.584906  17.378274  16.928903  16.930263  17.124931  17.134777  17.085276  17.110626
                    roaring_bitmap_contains_multi:  13.814899  13.872637  19.660377  13.902037  13.641540  13.806249  13.935047  13.803897  13.677905  13.276861
                     roaring_bitmap_contains_bulk:  13.902713  13.933411  16.509434  13.942774  15.148100  14.404645  14.579878  14.357751  13.852498  14.135060
        roaring_bitmap_contains with sorted input:   4.197930   4.076912  69.188679   4.445199   4.136423   3.986609   3.973190   3.941677   3.991282   4.075313
  roaring_bitmap_contains_multi with sorted input:   1.240232   1.163129   7.075472   1.414161   1.147784   1.153642   1.209232   1.138563   1.090571   1.092830
   roaring_bitmap_contains_bulk with sorted input:   1.426006   1.453911   7.075472   1.697381   1.406970   1.420494   1.474295   1.400675   1.401628   1.398334

Dr-Emann avatar Aug 19 '22 21:08 Dr-Emann

I think the performance difference is small enough that I would recommend keeping only roaring_bitmap_contains_bulk, since it's more general.

Sounds reasonable. We do not want to extend the API needlessly.

lemire avatar Aug 19 '22 22:08 lemire

It looks like vs16-ci is building the tests with NDEBUG set, which disables asserts, and my test currently checks for side effects from the argument passed to assert. It feels like setting NDEBUG for tests is counterproductive?

Dr-Emann avatar Aug 20 '22 02:08 Dr-Emann

It looks like vs16-ci is building the tests with NDEBUG set, which disables asserts, and my test currently checks for side effects from the argument passed to assert. It feels like setting NDEBUG for tests is counterproductive?

We definitively want the tests to build and run successfully irrespective of whether NDEBUG is set. You can prescribe to the users that they build the library without NDEBUG.

lemire avatar Aug 20 '22 13:08 lemire

The Linux builds, by default, are in release mode and release mode sets NDEBUG. We definitively want the tests to run in that case.

lemire avatar Aug 20 '22 13:08 lemire

It looks like we re-assign CMAKE_C_FLAGS_RELEASE for everything but MSVC to not include -DNDEBUG.

I agree that the library should always work the same regardless of the NDEBUG define, but I think we should consider forcibly undefine-ing it for tests: we use bare assert quite a bit in the tests, and their purpose is to test things: it's unfortunate that the tests check quite a bit less on MSVC in release mode.

Dr-Emann avatar Aug 20 '22 13:08 Dr-Emann

That said, that doesn't have to be in scope for this PR. I've changed the asserts in the bulk test to assert_true which is unconditionally run.

Dr-Emann avatar Aug 20 '22 13:08 Dr-Emann

As a matter of style, I don't think that tests should rely on asserts. If we are currently, then it is a mistake.

Note that I do not mean that we should not use asserts in tests, only that we should not judge the result of the test solely on whether an assert fired or not.

There may be exceptions.

lemire avatar Aug 20 '22 20:08 lemire

Please sync with our main branch (master).

lemire avatar Aug 20 '22 22:08 lemire

Thanks for syncing.

lemire avatar Aug 22 '22 01:08 lemire

@Dr-Emann Do you recommend we merge this PR?

lemire avatar Aug 25 '22 21:08 lemire

Yeah, I think it's worthwhile, it seems to almost always be at least a small win over non-bulk/array operations, and just a tiny bit slower than array operations, with more flexibility. I'm excited to use it for e.g. the rust FromIterator implementation in croaring-rs.

Dr-Emann avatar Aug 26 '22 00:08 Dr-Emann

Merging.

lemire avatar Aug 26 '22 20:08 lemire