sdsl-lite icon indicating copy to clipboard operation
sdsl-lite copied to clipboard

SSE4.2 detection fails

Open jltsiren opened this issue 8 years ago • 6 comments

When SDSL is built on an AMD Opteron 6174 CPU with g++ 4.9.2, the generated code does not run on that CPU.

CMakeModules/CheckSSE4_2.cmake detects the support for a builtin popcnt instruction by checking whether the CPU supports sse4_2 or sse4a. Then, in CMakeLists.txt, if the support was detected and the compiler is g++, the compiler option -msse4.2 is set. This makes g++ generate code that can use any SSE4.2 instructions anywhere in the code. Because some older (2011 and earlier) AMD processors support SSE4a but not SSE4.2, the generated code does not always run on them.

jltsiren avatar Nov 11 '15 12:11 jltsiren

Ok, so we need a more fine-grained handling. Maybe it is best to add a test program for each machine instruction and use cmake to determine if it works (similar to check_mode_ti.cpp which was recently added by @thinred )?

simongog avatar Nov 11 '15 12:11 simongog

Thanks for letting me know about this issue! I have two comments to make:

  1. in Debian I disable SSE unconditionally as we try to support any possible machine
  2. to test if a machine instruction works fine, you probably want try_run (https://cmake.org/cmake/help/v3.0/command/try_run.html), not try_compile which I used previously

thinred avatar Nov 11 '15 14:11 thinred

@thinred disabling SSE makes lots of structures much slower in practice.

is try_run available in cmake 2.8?

mpetri avatar Nov 16 '15 23:11 mpetri

@mpetri I'm of course aware that it makes things slower, but if somebody really needs more performance it is trivial to recompile the source package

And yes => https://cmake.org/cmake/help/v2.8.12/cmake.html#command:try_run

thinred avatar Nov 17 '15 05:11 thinred

Ideally the selection of whether to use the sse4_2, popcnt, or the reference implementation for bits::cnt (https://github.com/simongog/sdsl-lite/blob/7953ec61a4c3bcc7f49430c07b4b69be2c762abc/include/sdsl/bits.hpp#L247-L261) would be done at run-time rather than at compile time. Ideally, compiling an SSE capable version of sdsl would not result in a failure to run on machines without that support.

Perhaps something like sseplus could also help (not sure how its popcnt implementations compare to the GCC builtin version): http://sseplus.sourceforge.net/

jrandall avatar Nov 18 '15 12:11 jrandall

I cannot agree more, runtime detection would be great. I'm not sure this can be easily done, however. AFAIK, currently the SSE support is enforced as a flag to the compiler, and not via compiler intrinsics. Therefore, it is either a full opt-in or opt-out, at least with respect to a single compilation unit.

thinred avatar Nov 21 '15 22:11 thinred