x86-simd-sort
x86-simd-sort copied to clipboard
BLD: Add MSVC build support on windows
This pull request improves cross-platform support and CPU feature detection for the x86 SIMD sort library, primarily targeting compatibility with Microsoft's Visual C++ (MSVC) compiler. The changes ensure that both build configuration and runtime CPU feature checks work correctly on MSVC, while maintaining compatibility with other compilers.
Build system compatibility:
- Updated
lib/meson.buildto use MSVC-specific compiler flags (/arch:AVX2and/arch:AVX512) instead of the GCC/Clang flags (-march=...), ensuring correct SIMD instruction set selection on Windows/MSVC. [1] [2] [3] [4]
CPU feature detection abstraction:
- Added new header
lib/x86simdsortcpuid.hto abstract CPU feature detection, using MSVC intrinsics for runtime checks and caching results, while falling back to__builtin_cpu_supportsfor other compilers. - Replaced direct calls to
__builtin_cpu_supportsand__builtin_cpu_initinlib/x86simdsort.cppwith the newxss_cpu_supportsandxss_cpu_initfunctions/macros, ensuring consistent feature detection across platforms. [1] [2] [3]
Symbol visibility and constructor attribute compatibility:
- Defined
XSS_EXPORT_SYMBOL,XSS_HIDE_SYMBOL, andXSS_ATTRIBUTE_CONSTRUCTORmacros inlib/x86simdsort.handlib/x86simdsort.cppto handle symbol visibility and static initialization attributes for both MSVC and non-MSVC compilers. [1] [2]
Code style and correctness:
- Improved static assertions and formatting in
object_qsortfor clarity and correctness.
Miscellaneous:
- Added a blank line at the top of
meson.buildfor style consistency.
Using as HEAD: cfe57c4c7bf71a895f874e9fbfc087c774264011
I fix a few typos:
-#if defined(__INTEL_COMPILER) and !defined(__SANITIZE_ADDRESS__)
+#if defined(__INTEL_COMPILER) && !defined(__SANITIZE_ADDRESS__)
-#elif __GNUC__ >= 8 and !defined(__SANITIZE_ADDRESS__)
+#elif __GNUC__ >= 8 && !defined(__SANITIZE_ADDRESS__)
Then I open the x64 Native Tools Command Prompt for VS2022 and run the following commands from the lib folder:
cl /nologo /c /arch:AVX2 /I../src /std:c++17 x86simdsort-avx2.cpp
cl /nologo /c /arch:AVX512 /I../src /std:c++17 x86simdsort-skx.cpp
cl /nologo /c /arch:AVX512 /I../src /std:c++17 x86simdsort-icl.cpp
cl /nologo /c /arch:AVX512 /I../src /std:c++17 x86simdsort-spr.cpp
The first 2 compile, but the last 2 do not. See compilation logs attached: x86simdsort-icl.cpp.log x86simdsort-spr.cpp.log
This is the reason why I fell back to clang with mingw and introduced the C-api in #211.
The cl compiler version I use is: Microsoft (R) C/C++ Optimizing Compiler Version 19.44.35211 for x64
here you mention you could build the library with MSVC. What compiler flags and compiler version do you use?
Hope this helps.
cl /nologo /c /arch:AVX512 /I../src /std:c++17 x86simdsort-icl.cpp
4dc9609 fixes this error.
cl /nologo /c /arch:AVX512 /I../src /std:c++17 x86simdsort-spr.cpp
This is specific to _Float16 dispatch, if MSVC doesn't support it, then nothing much we can do about it. I would recommend using meson to build the library though, _Float16 is an optional requirement, meson build will auto detect if your compiler supports it and skips it if it doesn't.
https://github.com/intel/x86-simd-sort/pull/211#issuecomment-3286181131 you mention you could build the library with MSVC. What compiler flags and compiler version do you use?
This is what worked for me:
meson setup --warnlevel 2 --buildtype release builddir
cd builddir
ninja
@fabiocannizzo c1994cc adds a CI job to build with MSVC and that runs successfully. There are a few warnings that we can work on fixing (see #222) but I don't think that's gating the build itself. Does this address your concerns of building with MSVC?
@r-devulap : thank you for this work. I can build with meson using c1994cc but I cannot compile an executable linking with the import lib of the DLL. dumpbin /exports x86simdsort.lib shows no C++ exported symbols.
but I cannot compile an executable linking with the import lib of the DLL. dumpbin /exports x86simdsort.lib shows no C++ exported symbols.
oh? not sure why. MSVC isn’t my usual environment, but I’ll try to replicate that.
oh? not sure why.
Just a guess here: _declspec(dllexport) is explicitly needed for each template specialization to be exported.
if we have
template <class T> void foo() {...}
When we force specialisation for int we need:
template void _declspec(dllexport) foo<int>();
Just a guess here: _declspec(dllexport) is explicitly needed for each template specialization to be exported.
@fabiocannizzo that was it. Let me know if it works now.
@fabiocannizzo that was it. Let me know if it works now.
This is making progress. Now I can see exported symbols from the import library and the DLL. I can also create a simple executable linking with the library. However, when I try and run it, it segfaults.
int x[]= {4,3,2,1};
x86simdsort::qsort(x, 4, false, false); // this segfaults
With the debugger I can see it jumps to some invalid memory address when entering the function.
The problem is that the dynamic pointers are not initialized. For example, when the function below is invoked, the pointer internal_qsortint32_t has not been initialized and it is still nullptr. This means the function resolve_qsortint32_t is not called when the DLL is loaded.
static void (*internal_qsortint32_t)(int32_t *, size_t, bool, bool) = 0;
template <>
void __declspec(dllexport)
qsort(int32_t *arr, size_t arrsize, bool hasnan, bool descending)
{
(*internal_qsortint32_t)(arr, arrsize, hasnan, descending);
}
It is not clear to me what is the mechanism which trigger execution of resolve_qsortint32_t in the Linux case. Anyway, to fix it, there are various options.
- Add a
dllMainfunction inx86simdsort.cpp, and explicitly invoke theresolve_*functions.
#include <windows.h>
BOOL APIENTRY DllMain( HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved)
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
// invoke here all resolvers
resolve_qsortint32_t
// ...
break;
default:
break;
}
return TRUE; // Return TRUE for successful initialization, FALSE otherwise (especially for DLL_PROCESS_ATTACH)
}
- Add global scope initializers (in my opinion this is the simplest, given the structure of the library).
static void resolve_qsortint32_t(void) { ... };
struct Init_qsortint32_t
{
Init_qsortint32_t() { resolve_qsortint32_t(); }
} init_qsortint32_t; // constructor for this global instance is called at startup.
Appending the following to the dispatch macros solve the issue (see patch.txt):
#define DISPATCH(func, TYPE, ISA) \
DECLARE_INTERNAL_##func(TYPE) static XSS_ATTRIBUTE_CONSTRUCTOR void CAT( \
CAT(resolve_, func), TYPE)(void) \
{ \
...
} \
struct CAT(CAT(Init_, func), TYPE) { \
CAT(CAT(Init_, func), TYPE)() { CAT(CAT(resolve_, func), TYPE)(); } \
} CAT(CAT(initializer_, func), TYPE);
#define DISPATCH_KV_FUNC(func, TYPE1, TYPE2, ISA) \
static XSS_ATTRIBUTE_CONSTRUCTOR void CAT( \
CAT(CAT(CAT(resolve_, func), _), TYPE1), TYPE2)(void) \
{ \
...
} \
struct CAT(CAT(CAT(CAT(Init_, func), _), TYPE1), TYPE2) { \
CAT(CAT(CAT(CAT(Init_, func), _), TYPE1), TYPE2)() \
{ CAT(CAT(CAT(CAT(resolve_, func), _), TYPE1), TYPE2)();} \
} CAT(CAT(CAT(CAT(initializer_, func), _), TYPE1), TYPE2);
@fabiocannizzo thanks for debugging, I haven't had time to take a look at it yet. Will try your patch and get back to you soon.
@fabiocannizzo was finally able to install google test and build with MSVC. Running the tests reproduces the failure you see. The fix is straightforward, as you pointed out—I'll get it implemented soon. Quick question: are you currently using just qsort, or are there other APIs you're working with?"
I use all the c++ signatures (sort, partialsort, nthelement, and the keyvalue equivalents.