transmission icon indicating copy to clipboard operation
transmission copied to clipboard

fix: Remove faulty `popcnt` features checks for MSVC [#4872]

Open goldsteinn opened this issue 2 years ago • 3 comments

The popcnt feature checks weren't actually checking the ISA feature or cpuid. This caused an illegal instruction exception on x86 hardware that doesn't support popcnt when compiled with MSVC.

Since MSVC doesn't support any x86 ISA defines, the only way to check for popcnt is with cpuid. Since this is a tiny inline function, dynamic dispatch is likley not worth it so just removing the MSVC for popcnt. MSVC will now fallback to the generic implementation.

Fixes #4872.

Notes: Fixed 4.0.0 illegal instruction exception on some x86 Windows machines.

goldsteinn avatar Feb 14 '23 06:02 goldsteinn

I don't understand why this change is deemed MSVC-specific. There's still #elif defined(TR_HAVE_ASM_POPCNT) block which directly uses assembly instructions without any checks either.

MSVC will not fallback to the generic implementation.

Did you mean "now"?

mikedld avatar Feb 14 '23 09:02 mikedld

I don't understand why this change is deemed MSVC-specific. There's still #elif defined(TR_HAVE_ASM_POPCNT) block which directly uses assembly instructions without any checks either.

The issue was that the compile time check for if the target ISA supported popcnt where broken for MSVC. The compile time check was only checking if it was MSVC + x86 or MSVC + x64, whereas we need x86/x64 + SSE. AFAIK there is no way check the target ISA features with MSVC at compile time, hence we make MSVC builds just fallback to generic.

TR_HAVE_ASM_POPCNT is defined on #if defined(__x86_64__) && defined(__SSE__). AFAIK MSVC doesn't define __SSE__.

MSVC will not fallback to the generic implementation.

Did you mean "now"?

Yes, fixed in V2.

goldsteinn avatar Feb 14 '23 16:02 goldsteinn

ping.

goldsteinn avatar Feb 16 '23 23:02 goldsteinn

@goldsteinn I haven't made use of __SSE__ firsthand before this so this may be a dumb question, but this tests whether the processor doing the building supports SSE instructions, is that correct?

If so, what happens if this is built on a box that supports SSE and then is installed and run on a box that doesn't?

ckerr avatar Feb 17 '23 21:02 ckerr

@goldsteinn I haven't made use of __SSE__ firsthand before this so this may be a dumb question, but this tests whether the processor doing the building supports SSE instructions, is that correct?

Not the processor building, but the target processor (what you are building for which is not necessarily your build machine). I.e

clang -march=<target> ...

if target supports SSE then __SSE__ will be defined. If target does not support __SSE__ then it won't be. If you entirely skip the march=<target> then it will default to the generic arch target. On X86 the default ISA (instruction set) includes __SSE__ (gcc and clang.

If so, what happens if this is built on a box that supports SSE and then is installed and run on a box that doesn't?

Just building somewhere doesn't cause an issue (unless you pass something like -march=native), its just about the target. In general if you target a higher ISA level than what you run on the program will illegal instructions irrelivant of whether you use intrinsics (i.e it will autovectorize a loop with unsupported instructions).

goldsteinn avatar Feb 17 '23 21:02 goldsteinn

TR_HAVE_ASM_POPCNT is defined on #if defined(__x86_64__) && defined(__SSE__). AFAIK MSVC doesn't define __SSE__.

IIUC popcnt is introduced with SSE4 (https://en.wikipedia.org/wiki/SSE4#POPCNT_and_LZCNT) so I don't think checking for __SSE__ is correct. Did a few tests:

% g++ -dM -E - </dev/null | fgrep -i -e sse -e popcnt      
#define __MMX_WITH_SSE__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE__ 1
#define __SSE2__ 1

% g++ -msse4 -dM -E - </dev/null | fgrep -i -e sse -e popcnt
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __MMX_WITH_SSE__ 1
#define __POPCNT__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1

% g++ -msse4.1 -dM -E - </dev/null | fgrep -i -e sse -e popcnt
#define __SSE4_1__ 1
#define __MMX_WITH_SSE__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1

% g++ -msse4.2 -dM -E - </dev/null | fgrep -i -e sse -e popcnt
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __MMX_WITH_SSE__ 1
#define __POPCNT__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1

% g++ -mpopcnt -dM -E - </dev/null | fgrep -i -e sse -e popcnt
#define __MMX_WITH_SSE__ 1
#define __POPCNT__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSE__ 1
#define __SSE2__ 1

% g++ -march=native -dM -E - </dev/null | fgrep -i -e sse -e popcnt
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __MMX_WITH_SSE__ 1
#define __POPCNT__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1

% g++ -march=native -mno-sse4 -dM -E - </dev/null | fgrep -i -e sse -e popcnt
#define __MMX_WITH_SSE__ 1
#define __POPCNT__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1

% g++ -march=native -mno-popcnt -dM -E - </dev/null | fgrep -i -e sse -e popcnt
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __MMX_WITH_SSE__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1

So it looks like checking whether __POPCNT__ is defined is the right way, unless I'm missing something. Didn't check with clang though.

mikedld avatar Feb 17 '23 22:02 mikedld

TR_HAVE_ASM_POPCNT is defined on #if defined(__x86_64__) && defined(__SSE__). AFAIK MSVC doesn't define __SSE__.

IIUC popcnt is introduced with SSE4 (https://en.wikipedia.org/wiki/SSE4#POPCNT_and_LZCNT) so I don't think checking for __SSE__ is correct. Did a few tests:

So it looks like checking whether __POPCNT__ is defined is the right way, unless I'm missing something. Didn't check with clang though.

You're right, updated.

goldsteinn avatar Feb 17 '23 22:02 goldsteinn