transmission
transmission copied to clipboard
fix: Remove faulty `popcnt` features checks for MSVC [#4872]
The popcnt feature checks weren't actually checking the ISA feature or cpuid. This caused an illegal instruction exception on x86 hardware that doesn't support popcnt when compiled with MSVC.
Since MSVC doesn't support any x86 ISA defines, the only way to check for popcnt is with cpuid. Since this is a tiny inline function, dynamic dispatch is likley not worth it so just removing the MSVC for popcnt. MSVC will now fallback to the generic implementation.
Fixes #4872.
Notes: Fixed 4.0.0 illegal instruction exception on some x86 Windows machines.
I don't understand why this change is deemed MSVC-specific. There's still #elif defined(TR_HAVE_ASM_POPCNT) block which directly uses assembly instructions without any checks either.
MSVC will not fallback to the generic implementation.
Did you mean "now"?
I don't understand why this change is deemed MSVC-specific. There's still
#elif defined(TR_HAVE_ASM_POPCNT)block which directly uses assembly instructions without any checks either.
The issue was that the compile time check for if the target ISA supported popcnt where broken for MSVC. The compile time check was only checking if it was MSVC + x86 or MSVC + x64, whereas we need x86/x64 + SSE. AFAIK there is no way check the target ISA features with MSVC at compile time, hence we make MSVC builds just fallback to generic.
TR_HAVE_ASM_POPCNT is defined on #if defined(__x86_64__) && defined(__SSE__). AFAIK MSVC doesn't define __SSE__.
MSVC will not fallback to the generic implementation.
Did you mean "now"?
Yes, fixed in V2.
ping.
@goldsteinn I haven't made use of __SSE__ firsthand before this so this may be a dumb question, but this tests whether the processor doing the building supports SSE instructions, is that correct?
If so, what happens if this is built on a box that supports SSE and then is installed and run on a box that doesn't?
@goldsteinn I haven't made use of
__SSE__firsthand before this so this may be a dumb question, but this tests whether the processor doing the building supports SSE instructions, is that correct?
Not the processor building, but the target processor (what you are building for which is not necessarily your build machine). I.e
clang -march=<target> ...
if target supports SSE then __SSE__ will be defined. If target does not support __SSE__ then it won't be.
If you entirely skip the march=<target> then it will default to the generic arch target.
On X86 the default ISA (instruction set) includes __SSE__ (gcc and clang.
If so, what happens if this is built on a box that supports SSE and then is installed and run on a box that doesn't?
Just building somewhere doesn't cause an issue (unless you pass something like -march=native), its just about the target. In general if you target a higher ISA level than what you run on the program will illegal instructions irrelivant of whether you use intrinsics (i.e it will autovectorize a loop with unsupported instructions).
TR_HAVE_ASM_POPCNTis defined on#if defined(__x86_64__) && defined(__SSE__). AFAIK MSVC doesn't define__SSE__.
IIUC popcnt is introduced with SSE4 (https://en.wikipedia.org/wiki/SSE4#POPCNT_and_LZCNT) so I don't think checking for __SSE__ is correct. Did a few tests:
% g++ -dM -E - </dev/null | fgrep -i -e sse -e popcnt
#define __MMX_WITH_SSE__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE__ 1
#define __SSE2__ 1
% g++ -msse4 -dM -E - </dev/null | fgrep -i -e sse -e popcnt
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __MMX_WITH_SSE__ 1
#define __POPCNT__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1
% g++ -msse4.1 -dM -E - </dev/null | fgrep -i -e sse -e popcnt
#define __SSE4_1__ 1
#define __MMX_WITH_SSE__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1
% g++ -msse4.2 -dM -E - </dev/null | fgrep -i -e sse -e popcnt
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __MMX_WITH_SSE__ 1
#define __POPCNT__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1
% g++ -mpopcnt -dM -E - </dev/null | fgrep -i -e sse -e popcnt
#define __MMX_WITH_SSE__ 1
#define __POPCNT__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE__ 1
#define __SSE2__ 1
% g++ -march=native -dM -E - </dev/null | fgrep -i -e sse -e popcnt
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __MMX_WITH_SSE__ 1
#define __POPCNT__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1
% g++ -march=native -mno-sse4 -dM -E - </dev/null | fgrep -i -e sse -e popcnt
#define __MMX_WITH_SSE__ 1
#define __POPCNT__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1
% g++ -march=native -mno-popcnt -dM -E - </dev/null | fgrep -i -e sse -e popcnt
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __MMX_WITH_SSE__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1
So it looks like checking whether __POPCNT__ is defined is the right way, unless I'm missing something. Didn't check with clang though.
TR_HAVE_ASM_POPCNTis defined on#if defined(__x86_64__) && defined(__SSE__). AFAIK MSVC doesn't define__SSE__.IIUC
popcntis introduced with SSE4 (https://en.wikipedia.org/wiki/SSE4#POPCNT_and_LZCNT) so I don't think checking for__SSE__is correct. Did a few tests:So it looks like checking whether
__POPCNT__is defined is the right way, unless I'm missing something. Didn't check with clang though.
You're right, updated.