Daemon cmake: add `USE_ARCH_INTRINSICS` and `USE_COMPILER

This introduces the USE_ARCH_INTRINSICS CMake option. It is enabled by default.

Disabling it is meant to disable custom asm code and usage of intrinsincs functions for the target platform in the Dæmon code base, it may also be used by games built with the Dæmon common code base.

It is not meant to disable asm or intrinsincs usage in third-party libraries.

It is not meant to prevent the compiler to use such intrinsics in its optimization passes.

It is not meant to disable the compiler flags we set to tell the compiler to try to use such intrinsics in its optimization passes. For this, one should disable USE_CPU_RECOMMENDED_FEATURES instead.

For obvious reason the asm code in the BREAKPOINT() implementation is not meant to be disabled by USE_ARCH_INTRINSICS, the alternate code is to do nothing anyway.

The macro syntax is: DAEMON_ARCH_INTRINSICS_(architecture)[_extension]

Examples:

DAEMON_ARCH_INTRINSICS_i686: i686 specific code, including asm code.
DAEMON_ARCH_INTRINSICS_i686_sse: i686 SSE specific code.
DAEMON_ARCH_INTRINSICS_i686_sse2: i686 SSE2 specific code.

If a platform inherits feature from an parent platform, the parent platform name is used. For example on amd64, the definition enabling SSE code is DAEMON_ARCH_INTRINSICS_i686_sse, enabling SSE code on both i686 with SSE and amd64 platforms, and both DAEMON_ARCH_INTRINSICS_amd64 and DAEMON_ARCH_INTRINSICS_i686 are available.

This also introduces USE_COMPILER_BUILTINS CMake option. It is enabled by default.

Disabling it is meant to test the unknown compiler code.

May 15 '24 09:05 illwieckz

Fixes Unvanquished/Unvanquished#3001:

https://github.com/Unvanquished/Unvanquished/issues/3001

May 15 '24 09:05 illwieckz

This will make possible to test the non-SSE code and things like that in CI or in a local build.

May 15 '24 09:05 illwieckz

What about compiler-provided functions that are designed to get intrinsics to be used, but are provided for architectures without them as well? The functions used to implement CountTrailingZerores, __builtin_ctz* and BitScanForward* are like this. Also a lot of other things beginning with __builtin_.

I don't really know about them. Is the question about do we want to disable them or not?

The purpose of this PR is to be able to test all our code base, I consider a compiler-provided function not within scope of our test bed, the same way it's not our responsibility to check all alternate code GLM may implement for a single function. So this option is not to configure how compiler-provided functions behave. We should trust those compiler-provided functions.

What I want is to prevent code dying or rotting because we always compile the optimized code we wrote.

May 15 '24 21:05 illwieckz

Disabling such functions would help to test all our code. I provided a platform-independent version of CountTrailingZeroes, but I don't think it's possible to actually use it on any supported compiler. So disabling those functions would help to test it. On the other hand, one might argue that things not used on any supported compiler should be nuked.

May 15 '24 21:05 slipher

About compiler-provided functions, the same need will surface if we do something like:

#ifdef __GNUC__
compilerFunction();
#else
genericCode();
#endif

But maybe we would prefer another CMake option and compiler define.

May 15 '24 21:05 illwieckz

So reading the source, yes it looks like we can benefit from some USE_COMPILER_BUILTINS option to disable.

On the other hand, exotic compilers are more rare than exotic platforms, almost everything can be built using a GCC or Clang-based compiler today. Clang is becoming the Chromium of compilers… with Clang pretending to be GCC like Chromium itself pretending to be Mozilla…

May 15 '24 21:05 illwieckz

We can do that:

diff --git a/src/common/Compiler.h b/src/common/Compiler.h
index c3649c5a7..592ee8076 100644
--- a/src/common/Compiler.h
+++ b/src/common/Compiler.h
@@ -40,7 +40,7 @@ int CountTrailingZeroes(unsigned long x);
 int CountTrailingZeroes(unsigned long long x);
 
 // GCC and Clang
-#ifdef __GNUC__
+#if defined(DAEMON_USE_COMPILER_BUILTINS) && defined( __GNUC__ )
 
 // Emit a nice warning when a function is used
 #define DEPRECATED __attribute__((__deprecated__))
@@ -133,7 +133,7 @@ inline int CountTrailingZeroes(unsigned long x) { return __builtin_ctzl(x); }
 inline int CountTrailingZeroes(unsigned long long x) { return __builtin_ctzll(x); }
 
 // Microsoft Visual C++
-#elif defined( _MSC_VER )
+#elif defined(DAEMON_USE_COMPILER_BUILTINS) && defined( _MSC_VER )
 
 // Disable some warnings
 #pragma warning(disable : 4100) // unreferenced formal parameter

May 15 '24 21:05 illwieckz

I can do a commit for that.

May 15 '24 21:05 illwieckz

So this option is not to configure how compiler-provided functions behave. We should trust those compiler-provided functions.

The two cases seem pretty similar to me. Yes, we trust the intrinsic/compiler-provided function. We want to be able to test both the version of our code with the intrinsic/compiler-provided function and the version without the intrinsic/compiler-provided function.

About compiler-provided functions, the same need will surface if we do something like:
#ifdef __GNUC__
compilerFunction();
#else
genericCode();
#endif
But maybe we would prefer another CMake option and compiler define.

The problem with a "generic compiler" option would be that the program is likely to not actually build or work, since some of the compiler-specific things are required. For example, you may get an error about a function not returning a value when NORETURN or UNREACHABLE is not implemented. At runtime things may break if ALIGNED is not implemented.

A "generic compiler" option may end up being a near superset of disabling intrinsics. SSE-family intrinsics are a rare case where there is some sort of standard syntax across compilers. I think most other things will be platform-specific, e.g. inline assembly.

There is not really a bright line between "true" intrinsics, and functions designed to be implemented with intrinsics. An optimizing compiler may remove an intrinsic function and implement it in a different way. For example your compiler replaced the _mm_store_ps followed by scalar addition with an all-SSE implementation. Maybe some compilers would implement _mm_* functions even on a non-SSE platform. I read of a library that implements them using ARM Neon instructions. An intrinsic is just a compiler-provided function that is not implemented on some platforms.

For these reasons I favor including __builtin_ functions, etc., under the umbrella of USE_CPU_INTRINSICS.

May 15 '24 23:05 slipher

For these reasons I favor including __builtin_ functions, etc., under the umbrella of USE_CPU_INTRINSICS.

OK that makes sense.

I read of a library that implements them using ARM Neon instructions.

Yes there is at least sse2neon and SIMD Everywhere (that does more than ARM, it can even be used for WebAssembly).

I used sse2neon with an old version of Embree used by LuxMark3 and I used SIMDe to with LuxMark3 to port LuxMark3 on ARM.

I may evaluate SIMDe for Dæmon on ARM64 at some point (we already require NEON).

May 16 '24 08:05 illwieckz

Interesting, disabling arch builtins break that unit test:

/home/vsts/work/1/s/src/common/cm/unittest.cpp:168: Failure
Expected equality of these values:
  tr.fraction
    Which is: 0.19213901
  0.1921389

Log here.

May 16 '24 12:05 illwieckz

So, there is now:

USE_ARCH_INTRINSICS which is meant to enable asm code, SSE and things like CountTrailingZeroes. This make possible to test alternate code for when the architecture is unknown.
USE_COMPILER_BUILTINS which is meant to disable specific keywords using compiler-specific builtins like UNREACHABLE and things like that. This make possible to test the alternate code for when the compiler is unknown.

I rewrote a bit parts of Compiler.h, for example defining C++ features according to C++ version definitions instead of compiler definitions.

In the end, while not being the purpose of this PR, the compilation of the server and the TTY client with C++11 is working again. The graphical client makes use of C++14 features and there is no plan to change that (it's fine). But all of this means the code is better written and is more robust, being more compatible with older or unknown compilers.

May 16 '24 12:05 illwieckz

I don't know why but the appveyor build fails on some curious errors…

engine/renderer/tr_types.h(142,15): error C2220:
  the following warning is treated as an error
  (compiling source file common\cm\cm_load.cpp) [build\engine-lib.vcxproj]
engine/renderer/tr_types.h(142,15): warning C4324:
  'refBone_t': structure was padded due to alignment specifier
  (compiling source file common\cm\cm_load.cpp) [C:\projects\daemon\build\engine-lib.vcxproj]

See https://ci.appveyor.com/project/DolceTriade/daemon/builds/49827605/job/tgqld13lrg696s61

May 16 '24 12:05 illwieckz

Well, there was a more meaningful error after that:

common\FileSystem.cpp(309,22):
  error C2694: 'const char *FS::minizip_category_impl::name(void) const':
  overriding virtual function has less restrictive exception specification than base class virtual member function 'const char *std:

MSVC doesn't properly report __cplusplus by default to keep compatibility with broken legacy code:

https://devblogs.microsoft.com/cppblog/msvc-now-correctly-reports-__cplusplus/

May 16 '24 13:05 illwieckz

I've fixed the problem related to __cplusplus, but the padding error still remains.

May 16 '24 13:05 illwieckz

OK, MSVC build is now fixed for win32, not for x64, still getting that refBone_t padding problem…

May 16 '24 13:05 illwieckz

What seems to break it is cmake: add USE_ARCH_INTRINSICS CMake option commit. That makes no sense.

May 16 '24 15:05 illwieckz

I may have found the error… MSVC doesn't set __SSE*__ neither _M_IX86_FP on amd64, meaning that without testing _M_AMD64 or _M_X64 we didn't include xmmintrin.h and emmintrin.h.

Maybe MSVC changes some of its internal behavior when those headers are not included…

May 16 '24 15:05 illwieckz

engine/renderer/tr_types.h(142,15): warning C4324: 'refBone_t': structure was padded due to alignment specifier

I also saw this warning when I tried implementing USE_RECOMMENDED_CPU_FEATURES for MSVC so I could test with SSE2 off. It seems that you get this warning when:

You have a struct A that contains a member of type B
B has a user-requested alignment (via __declspec(align(n)) or alignas) which is greater than its natural alignment
The alignment of B is greater than the natural alignment of any member of A
A does not have a user-requested alignment at least as large as B's

So this warning happens when our SSE code is disabled because the structure's natural alignment is no longer 16, since it doesn't contain an __m128. ~~I implemented a fix (see the branch slipher/align-stuff), but it is a pretty silly warning so I'd probably just disable it if it annoyed us again.~~

May 16 '24 17:05 slipher

I implemented a fix (see the branch slipher/align-stuff), but it is a pretty silly warning so I'd probably just disable it if it annoyed us again.

Oops, apparently the fix doesn't work. Still getting the warning.

May 16 '24 17:05 slipher

I noticed we get the warning when:

/arch:SSE2 is set or compiling for amd64,
we don't use any function from xmmintrin.h ,
we don't include xmmintrin.h.

I have not tried when /arch:SSE2 is not set.

May 16 '24 17:05 illwieckz

New warning theory:

You have a struct A
A's alignment is greater than its natural aligment (considered by recursively ignoring alignment specifiers), due to a user-requested alignment (via __declspec(align(n)) or alignas) on A or one of its recursive members
A has holes in its layout

This is definitely bogus - I'm not interested in writing char pad[8] or whatever everywhere. Will disable it in my branch.

May 16 '24 17:05 slipher

So, even not doing /arch:SSE2 on x86 doesn't fix the warning, but the warning disappears as soon as xmmintrin.h is included…

May 16 '24 18:05 illwieckz

Can it be possible that xmmintrin.h just do #pragma warning(disable : 4324) ?

May 16 '24 18:05 illwieckz

Can it be possible that xmmintrin.h just do #pragma warning(disable : 4324) ?

It does not.

It's not about whether SSE headers are included; it's whether we use __m128 in our structs. Removing /arch:SSE2 has no effect as that is the default. You have to explicitly request a lower architecture; see #1152.

May 16 '24 18:05 slipher

it's whether we use __m128 in our structs.

Can it be possible that __m128 would be redefined in a different way if xmmintrin.h is included?

I don't get why including this header file makes the warning go away…

May 16 '24 18:05 illwieckz

I removed the HACK including xmmintrin.h in non-SSE MSVC contexts now that the build doesn't fail thanks to:

https://github.com/DaemonEngine/Daemon/pull/1152

May 25 '24 13:05 illwieckz

So there is now:

USE_ARCH_INTRINSICS, for enabling custom code making use of SSE and asm and things like that.
USE_COMPILER_INTRINSICS, for enabling custom CountTrailingZeroes and things like that.
USE_COMPILER_CUSTOMIZATION, for enabling custom compiler attributes and operators.

They are all enabled by default and marked as advanced options (not displayed by default to the user in ccmake).

One can disable them either to test non-SSE code, to test code not making use of compiler builtins like __builtin_ctz, or the unknown compiler preprocessor code path.

May 25 '24 20:05 illwieckz

I reworded a bit the way fallback definitions are done: when there is a known implementation the define is set, and at the end the fallback is set if nothing defined it before. It avoids copy-pasting the fallback definitions for every condition with the risk to miss some.

I also reworded this:

int CountTrailingZeroes(unsigned int x);
int CountTrailingZeroes(unsigned long x);
int CountTrailingZeroes(unsigned long long x);
inline int CountTrailingZeroes(unsigned int x) { return __builtin_ctz(x); }
inline int CountTrailingZeroes(unsigned long x) { return __builtin_ctzl(x); }
inline int CountTrailingZeroes(unsigned long long x) { return __builtin_ctzll(x); }

Into this:

inline int CountTrailingZeroes(unsigned int x) { return __builtin_ctz(x); }
inline int CountTrailingZeroes(unsigned long x) { return __builtin_ctzl(x); }
inline int CountTrailingZeroes(unsigned long long x) { return __builtin_ctzll(x); }

I also removed the useless warning on missing BREAKPOINT implementation, we never did anything with it and we silence it for NaCl since forever.

I believe I have nothing to add within the scope of that PR.

Alongside the addition of CMake options, the way things are done are now much more meaningful.

There can be other things done later like evaluating if things are still needed, but I think it's better this way as a first step, and future clean-up commits may even be simpler.

I noticed the server and tty still build with c++11 (and run properly after that) so for now I see no reason to drop the related fallbacks.

This looks ready to me.

May 28 '24 11:05 illwieckz

Daemon
Daemon copied to clipboard

cmake: add `USE_ARCH_INTRINSICS` and `USE_COMPILER_BUILTINS` CMake options

Daemon Daemon copied to clipboard

cmake: add `USE_ARCH_INTRINSICS` and `USE_COMPILER_BUILTINS` CMake options

Daemon
Daemon copied to clipboard