CRoaring icon indicating copy to clipboard operation
CRoaring copied to clipboard

Cleanup headers

Open fsaintjacques opened this issue 9 years ago • 25 comments

I'd give a little love to the headers:

  • separate concerns (basic operations, advanced operations, serializations) in multiple headers, such the user can only import what he needs.
  • hide definitions, does the user really need the full definition? We could probably hide in a forward declaration à la struct roaring_bitmap;
  • Remove unnecessary public that should really be private stuff.

fsaintjacques avatar Sep 02 '16 14:09 fsaintjacques

Sure, but please wait a bit since I am doing some reengineering and if my work is fruitful, we could end up with a massive conflict from hell.

lemire avatar Sep 02 '16 15:09 lemire

I am aware of the nasty conflicts it'll ensue and will wait for your work.

fsaintjacques avatar Sep 02 '16 15:09 fsaintjacques

I'll know soon whether my idea helps or not... I am doing as fast as I can but it involves lots of code, sadly.

lemire avatar Sep 02 '16 15:09 lemire

Some things to take into account...

  1. We'd like to keep the unity build in working order.
  2. The various wrappers, https://github.com/Ezibenroc/PyRoaringBitMap https://github.com/saulius/croaring-rs https://github.com/RoaringBitmap/gocroaring

should be preserved in working order as well.

lemire avatar Sep 02 '16 23:09 lemire

Ok. I am done. Go ahead!

lemire avatar Sep 03 '16 01:09 lemire

Do you have any idea when you'd like to work on this? We have this other issue (https://github.com/RoaringBitmap/CRoaring/issues/47) which would require touching quite a bit of code. I'd like to avoid conflicts... so if you want to clean the headers soon, then we will delay work on issue 47 till you are done, otherwise, we may proceed with issue 47 first.

lemire avatar Sep 05 '16 15:09 lemire

I'll try to tackle it today. I will do this in a separate branch. It is worth noting that it will probably break bindings?

fsaintjacques avatar Sep 06 '16 11:09 fsaintjacques

Should I take the opportunity to change the signature of some functions, e.g. const parameters usually preceed non-const void roaring_bitmap_andnot_inplace(roaring_bitmap_t *x1, const roaring_bitmap_t *x2); would become void roaring_bitmap_andnot_inplace(const roaring_bitmap_t *x1, roaring_bitmap_t *x2);.

fsaintjacques avatar Sep 06 '16 13:09 fsaintjacques

It is worth noting that it will probably break bindings?

Hopefully, it won't break the unity build.

It is probably best to aim to break as few things as possible, as a general rule, right?

separate concerns (basic operations, advanced operations, serializations) in multiple headers, such the user can only import what he needs.

That's a good idea but this does not prevent you from having a roaring.h file that offers the same API as it does now, just have it include all the subheaders so that users who do not need separation of concern can maintain the current convenience of having a single include.

hide definitions, does the user really need the full definition? We could probably hide in a forward declaration à la struct roaring_bitmap;

The inline definitions are a matter of performance. We had major performance issues due to lack of inlining. I did not move the definitions into header files for amusement.

 Remove unnecessary public that should really be private stuff.

Moving headers away from view is great. We should ensure that it does not come at the expense of performance. Being able to inline definitions is really important in some instances.

Should I take the opportunity to change the signature of some functions, e.g. const parameters usually preceed non-const

I have to disagree on this point. When we do in-place operations, the first parameter is usually the mutable one.

Here is the equivalent using operators:

x &= y

See how the x occurs before the y? I am sure that there are languages that read backward, but we are in C.

(C does not have a "andnot" operator unlike, say, Go, but it is not important.)

Here is how you'd implement it in C++:

type operator&(type & lhs, const type& rhs) {
   return lhs &= rhs;
}

See how the const parameter appears last?

lemire avatar Sep 06 '16 14:09 lemire

Another wrapper (C#): https://github.com/RogueException/CRoaring.Net

lemire avatar Sep 06 '16 14:09 lemire

Aren't inlining concerns fixed by proper LTO support? This small peculiarity forces some private header to be public.

I've come to understand that for a proper C library, parameter order should be: input, input-output, output. This is a matter of opinion.

I'm looking ahead, and this feels like wasted time. I'm just gonna drop this issue.

fsaintjacques avatar Sep 06 '16 14:09 fsaintjacques

Aren't inlining concerns fixed by proper LTO support?

Good point. I think that this assumes that library users will rely on LTO. Otherwise, they will simply get subpar performance.

I've come to understand that for a proper C library, parameter order should be: input, input-output, output. This is a matter of opinion.

When you copy something you go

`destination = source;``

So it makes sense that we have

copy(destination, source);

Say you have a hash table and you insert something in it. Intuitively, you'd write

set[key] = value

And thus, I think you'd write insert(hash_table *table, const int key, const int value )

I could be wrong, of course... that's just my intuition.

(You have just inspired a blog post.)

I'm looking ahead, and this feels like wasted time. I'm just gonna drop this issue.

AH!!!! I hope it is not because of the points I raised.

I will take note that we are delaying this issue.

lemire avatar Sep 06 '16 15:09 lemire

Function signature: how do you order parameters? http://lemire.me/blog/2016/09/06/function-signature-how-do-you-order-parameters/

lemire avatar Sep 06 '16 16:09 lemire

I did some minor cleaning... in commit 4fda6bae6d5449b7e6bddbefd7c7df5ce74305b4 but it lead to a significant performance degradation (yet I just renamed the defines). I think that my experience illustrates the need to be careful.

lemire avatar Sep 09 '16 22:09 lemire

I will painfully undo the commit.

lemire avatar Sep 09 '16 22:09 lemire

I have undone my cleaning work. I have read and re-read the cleaned code and I do not see where it could have changed the generated code, but obviously, it did degrade the binaries quite a bit.

lemire avatar Sep 09 '16 23:09 lemire

Which machine are you testing this on? I did upgrade the default compiler on Skylake (and maybe Haswell), and it's possible that might have effects on performance.

More generally, I think this is a reminder that performance in a high level language is fragile, and at the whim of the compiler. If we want robust performance, we probably need some way to lock things in once we have what we want.

--nate

nkurz avatar Sep 09 '16 23:09 nkurz

@nkurz Nah. It is not a compiler issue.

With the magic of git, I can move back and forth in time and see exactly at which point the performance went down (using the same compiler and machine).

More generally, I think this is a reminder that performance in a high level language is fragile, and at the whim of the compiler. If we want robust performance, we probably need some way to lock things in once we have what we want.

My intention was just to rename the enums and the defines so that they are all prefixed by "ROARING_". This should not impact the generated binaries in the least. But it did something bad.

lemire avatar Sep 09 '16 23:09 lemire

Here is the benchmark I use...

https://github.com/RoaringBitmap/CBitmapCompetition#results

(I published it so one could refer to it in the future.)

lemire avatar Sep 09 '16 23:09 lemire

@nkurz There is one ill-effect from the compiler upgrade however... The sanitize mode is now broken. I doubt it is your doing per se, but I think it is related to the new compiler.

$ mkdir debug
dlemire@skylake:~/CVS/github/CRoaring$ cmake -DCMAKE_BUILD_TYPE=Debug -DSANITIZE=ON ..
CMake Error: The source directory "/home/dlemire/CVS/github" does not appear to contain CMakeLists.txt.
Specify --help for usage, or press the help button on the CMake GUI.
dlemire@skylake:~/CVS/github/CRoaring$ cd debug/
dlemire@skylake:~/CVS/github/CRoaring/debug$ cmake -DCMAKE_BUILD_TYPE=Debug -DSANITIZE=ON ..
-- The C compiler identification is GNU 6.2.0
-- The CXX compiler identification is GNU 6.2.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- BENCHMARK_DATA_DIR: /home/dlemire/CVS/github/CRoaring/benchmarks/realdata/
-- TEST_DATA_DIR: /home/dlemire/CVS/github/CRoaring/tests/testdata/
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- CMAKE_BUILD_TYPE: Debug
-- DISABLE_AVX: OFF
-- BUILD_STATIC: OFF
-- BUILD_LTO: OFF
-- SANITIZE: ON
-- CMAKE_C_COMPILER: /usr/bin/cc
-- CMAKE_C_FLAGS: -std=c11 -fPIC -march=native  -Wall -Winline -Wshadow -Wextra -pedantic -fsanitize=address -fno-omit-frame-pointer -fsanitize=undefined
-- CMAKE_C_FLAGS_DEBUG: -ggdb
-- CMAKE_C_FLAGS_RELEASE: -O3
-- Configuring done
-- Generating done
-- Build files have been written to: /home/dlemire/CVS/github/CRoaring/debug
dlemire@skylake:~/CVS/github/CRoaring/debug$ make
Scanning dependencies of target roaring
[  3%] Building C object src/CMakeFiles/roaring.dir/array_util.c.o
[  6%] Building C object src/CMakeFiles/roaring.dir/bitset_util.c.o
[  9%] Building C object src/CMakeFiles/roaring.dir/containers/array.c.o
[ 12%] Building C object src/CMakeFiles/roaring.dir/containers/bitset.c.o
[ 16%] Building C object src/CMakeFiles/roaring.dir/containers/containers.c.o
[ 19%] Building C object src/CMakeFiles/roaring.dir/containers/convert.c.o
[ 22%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_intersection.c.o
[ 25%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_union.c.o
[ 29%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_equal.c.o
[ 32%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_negation.c.o
[ 35%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_xor.c.o
[ 38%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_andnot.c.o
[ 41%] Building C object src/CMakeFiles/roaring.dir/containers/run.c.o
[ 45%] Building C object src/CMakeFiles/roaring.dir/roaring.c.o
[ 48%] Building C object src/CMakeFiles/roaring.dir/roaring_priority_queue.c.o
[ 51%] Building C object src/CMakeFiles/roaring.dir/roaring_array.c.o
Linking C shared library ../libroaring.so
/usr/bin/ld: unrecognized option '--push-state'
/usr/bin/ld: use the --help option for usage information
collect2: error: ld returned 1 exit status
make[2]: *** [libroaring.so] Error 1
make[1]: *** [src/CMakeFiles/roaring.dir/all] Error 2
make: *** [all] Error 2

lemire avatar Sep 09 '16 23:09 lemire

@nkurz (Thankfully I can run the tests with the sanitizer on other machines so it is not bad.)

lemire avatar Sep 09 '16 23:09 lemire

With the magic of git, I can move back and forth in time and see exactly at which point the performance went down (using the same compiler and machine).

Yes, I didn't know if you'd tested the new version with the old compiler to be sure, and vice versa.

It is not a compiler issue. ... My intention was just to rename the enums and the defines so that they are all prefixed by "ROARING_". This should not impact the generated binaries in the least. But it did something bad.

When you rename a variable and the performance drops, I tend to think of that as a compiler issue.

There is one ill-effect from the compiler upgrade however... The sanitize mode is now broken. I doubt it is your doing per se, but I think it is related to the new compiler.

Seems like Ubuntu's doing rather than mine, but I think I patched it with d5a081a3337ae804f9a25848a807dcca5a77096b. Separately, I noticed that the "make test" has failed tests on Skylake, but I presume you are already aware of this.

nkurz avatar Sep 10 '16 01:09 nkurz

Yes, I was working on fixing a minor issue.

lemire avatar Sep 10 '16 02:09 lemire

  1. Header files which are not part of public interface (cannot be reached from roaring.h) should be moved from include/ into src/ directory. Where possible, public interface should be reduced.
  2. Public interface should not depend on USEAVX and other similar definitions. For example, currently I see that run.h is referenced by roaring.h (roaring.h -> roaring_array.h -> containers.h -> run.h) and its body depends on USEAVX. Simple prerelase test: grep -R USEAVX include/ >/dev/null && echo "error".

andreigudkov avatar Nov 09 '18 08:11 andreigudkov

@andreigudkov PR invited!!!

But care is needed because we have an outstanding unmerged PR.

lemire avatar Nov 09 '18 19:11 lemire