simde
simde copied to clipboard
Couple of build issues on 32bit MSVC v143 build with /arch:SSE
Hi again, I'm experiencing build errors with the current master (e651ec360f22d5f9d7fb2c75c5025db614f78b8c) Many issues have been fixed regarding my prior issue, including many of the failing tests involving precision discrepancies. Here's the summary:
All my issues apply to MSVC v143 x86 builds - I have 5 such builds, each with a different /arch:XXX
option:
1. The two build issues apply only to MSVC v143 x86 with /arch:SSE
enabled
a) The first build issue involves reaching line 4774 here: https://github.com/simd-everywhere/simde/blob/e651ec360f22d5f9d7fb2c75c5025db614f78b8c/simde/x86/sse2.h#L4770-L4774
The line involves GCC syntax, it should be guarded. I can fix this build issue by doing:
simde_mm_pause (void) {
#if defined(SIMDE_X86_SSE2_NATIVE)
_mm_pause();
#elif defined(SIMDE_ARCH_X86)
#if defined(_MSC_VER)
_mm_pause();
#else
__asm__ __volatile__("pause");
#endif
#elif defined(SIMDE_ARCH_ARM_NEON)
But not sure if _mm_pause() won't cause invalid instruction at runtime, if the SIMDE_X86_SSE2_NATIVE isn't available. Is it available on SIMDE_X86_SSE_NATIVE?
b) The second build issue involves SVML
I can circumvent it by simply guard the include of the svml.h
- not sure if that is intended though. There should be fallback alternatives in SIMDe? These errors seemingly involve only 'simde__m128d' to '__m128d'
conversion issues.
2. The test issues apply to all /arch
options on MSVC v143 x86 builds:
They seemingly affect only pow
functions precision, but my tests aren't exhaustive, so not sure.
I was able to fix error 1.b by changing this check, which reenables native SVML for MSVC, while ignoring potential /arch:SSE: https://github.com/simd-everywhere/simde/blob/e651ec360f22d5f9d7fb2c75c5025db614f78b8c/simde/simde-features.h#L295-L299 to
#if !defined(SIMDE_X86_SVML_NATIVE) && !defined(SIMDE_X86_SVML_NO_NATIVE) && !defined(SIMDE_NO_NATIVE)
#if defined(SIMDE_ARCH_X86) && (defined(__INTEL_COMPILER) || (HEDLEY_MSVC_VERSION_CHECK(14, 20, 0) && !defined(__clang__) && defined(SIMDE_X86_SSE2_NATIVE)))
#define SIMDE_X86_SVML_NATIVE
#endif
#endif
Only the __m128d
(because it comes with SSE2?) is troublesome though. It probably should provide the single precision SVML functions either way, so I suppose this is not the optimal solution
Hello @Epixu ; do you still have issues with 32bit MSVC builds using the latest SIMDe: v0.8.0?
Hi, I will bump the version and check it in a couple of days, thanks for the notification!
No change for this issue with the current master (517da845ac83098d06c5b0c9d8705ac8eb32a5ba) Both the build and test issues remain the same
But not sure if _mm_pause() won't cause invalid instruction at runtime, if the SIMDE_X86_SSE2_NATIVE isn't available. Is it available on SIMDE_X86_SSE_NATIVE?
According to Intel, _mm_pause
is SSE2 only: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_paus&ig_expand=4897
and Microsoft agrees: https://learn.microsoft.com/en-us/cpp/intrinsics/x86-intrinsics-list?view=msvc-170
So a guard can be added to not use the GCC asm syntax on MSVC
@Epixu can you test __asm pause;
as an MSVC-friendly alternative to __asm__ __volatile__("pause");
?
With https://github.com/simd-everywhere/simde/commit/2d498c864512605cef62e2d21bec3e24091ef427
I'm seeing the following SIMDe x86 test errors on 32-bit MSVC:
473/2002 x86/sse/native/c TIMEOUT 30.24s exit status 1
../test/x86/sse.c:2094: assertion failed: r[0] ~= test_vec[i].r[0] (-836.000000 ~= -836.000000)
../test/x86/sse.c:2167: assertion failed: r[0] ~= test_vec[i].r[0] (-172.000000 ~= -172.000000)
../test/x86/sse.c:2246: assertion failed: r[0] ~= test_vec[i].r[0] (25075.000000 ~= 25075.000000)
../test/x86/sse.c:2287: assertion failed: r[0] ~= test_vec[i].r[0] (-200.000000 ~= -200.000000)
../test/x86/sse.c:2333: assertion failed: r[0] ~= e[0] (46384.000000 ~= 46384.000000)
../test/x86/sse.c:2375: assertion failed: r[0] ~= test_vec[i].r[0] (113.000000 ~= 113.000000)
../test/x86/sse.c:2522: assertion failed: r[0] ~= simde_mm_loadu_ps(test_vec[i].r)[0] (50326.000000 ~= 50326.000000)
../test/x86/sse.c:2555: assertion failed: r[0] ~= simde_mm_loadu_ps(test_vec[i].r)[0] (101.000000 ~= 101.000000)
../test/x86/sse.c:2899: assertion failed: r[0] ~= test_vec[i].r[0] (-3.607422 ~= -3.610000)
../test/x86/sse.c:4429: assertion failed: r[0] ~= simde_mm_loadu_ps(test_vec[i].r)[0] (-609.119995 ~= -609.119995)
Which is pretty weird, the numbers look exactly the same to me except for line 2899; something about the float comparator isn't working?
SSE2, no timeout, so we get a list of the failing function tests:
| 475/2002 sse2/mm_setr_pd FAIL
| 475/2002 sse2/x_mm_abs_pd FAIL
| 475/2002 sse2/mm_store_pd FAIL
| 475/2002 sse2/mm_min_pd FAIL
| 475/2002 sse2/mm_cmpord_pd FAIL
| 475/2002 sse2/mm_cvtpd_ps FAIL
| 475/2002 sse2/mm_cvtpi32_pd FAIL
| 475/2002 sse2/mm_div_pd FAIL
| 475/2002 sse2/mm_or_pd FAIL
../test/x86/sse2.c:7132: assertion failed: r[0] ~= test_vec[i].r[0] (0.740000 ~= 0.740000)
../test/x86/sse2.c:59: assertion failed: r[0] ~= simde_mm_loadu_pd(test_vec[i].r)[0] (147.280000 ~= 147.280000)
../test/x86/sse2.c:8244: assertion failed: r[0] ~= test_vec[i].r[0] (176.750000 ~= 176.750000)
../test/x86/sse2.c:5020: assertion failed: r[0] ~= test_vec[i].r[0] (-514.320000 ~= -514.320000)
../test/x86/sse2.c:2558: assertion failed: r[1] ~= simde_mm_loadu_pd(test_vec[i].r)[1] (0.000000 ~= 0.000000)
../test/x86/sse2.c:3362: assertion failed: r[0] ~= test_vec[i].r[0] (689.409973 ~= 689.409973)
../test/x86/sse2.c:3394: assertion failed: r[0] ~= test_vec[i].r[0] (-579.000000 ~= -579.000000)
../test/x86/sse2.c:4152: assertion failed: r[0] ~= test_vec[i].r[0] (1.158700 ~= 1.160000)
../test/x86/sse2.c:5919: assertion failed: r[0] ~= test_vec[i].r[0] (128062.238280 ~= 128062.240000)
Again, all visually identical, except for one (line 4152); yet still assertion failed
Does ~= stand for some approximate comparison based on ULPs? Could these look the same due to bad print? Maybe if printed in scientific notation would be better?
I will try __asm pause in a couple of days
Does ~= stand for some approximate comparison based on ULPs? Could these look the same due to bad print? Maybe if printed in scientific notation would be better?
Could be, I didn't write that code and it feels well outside my area of expertise: https://github.com/simd-everywhere/simde/blob/8639feff3843dc02c8f5e2e34a2ba865d3acb546/test/test.h#L729-L748 https://github.com/simd-everywhere/simde/blob/8639feff3843dc02c8f5e2e34a2ba865d3acb546/test/test.h#L866-L867
I will try __asm pause in a couple of days
Thanks!
The single precision comparison function is not ULPs based, it uses a precision range, which if zero - does a perfect compare. The failing functions however seem to involve double precision, so some points of failure come to mind: is the proper double-precision compare used, and is precision specified properly to not be zero, because internal CPU float representation is always assumed to be a black box (while residing in the registers, they're usually with additional bits of precision, which get truncated later) - can be anything on different CPU architectures.
Also, it would probably be better to use (%e ~= %e) to see if something changes at the back of the float
Thanks for the explanation @Epixu , here are the same test failures using hexadecimal exponent notation. All affected tests do the float comparison using a precision of 1
== a slop of 10^-1 (which is rather generous, if you ask me).
While all of these affected tests use simde_test_x86_assert_equal_f32x{2,4}
or simde_assert_equal_vf32
there are other tests that use those comparators which don't fail.
../test/x86/sse.c:2094: assertion failed: r[0] ~= test_vec[i].r[0] (-0x1.a200000000000p+9 ~= -0x1.a200000000000p+9)
../test/x86/sse.c:2167: assertion failed: r[0] ~= test_vec[i].r[0] (-0x1.5800000000000p+7 ~= -0x1.5800000000000p+7)
../test/x86/sse.c:2246: assertion failed: r[0] ~= test_vec[i].r[0] (0x1.87cc000000000p+14 ~= 0x1.87cc000000000p+14)
../test/x86/sse.c:2287: assertion failed: r[0] ~= test_vec[i].r[0] (-0x1.9000000000000p+7 ~= -0x1.9000000000000p+7)
../test/x86/sse.c:2333: assertion failed: r[0] ~= e[0] (0x1.6a60000000000p+15 ~= 0x1.6a60000000000p+15)
../test/x86/sse.c:2375: assertion failed: r[0] ~= test_vec[i].r[0] (0x1.c400000000000p+6 ~= 0x1.c400000000000p+6)
../test/x86/sse.c:2522: assertion failed: r[0] ~= simde_mm_loadu_ps(test_vec[i].r)[0] (0x1.892c000000000p+15 ~= 0x1.892c000000000p+15)
../test/x86/sse.c:2555: assertion failed: r[0] ~= simde_mm_loadu_ps(test_vec[i].r)[0] (0x1.9400000000000p+6 ~= 0x1.9400000000000p+6)
../test/x86/sse.c:2899: assertion failed: r[0] ~= test_vec[i].r[0] (-0x1.cdc0020000000p+1 ~= -0x1.ce147a0000000p+1)
../test/x86/sse.c:4429: assertion failed: r[0] ~= simde_mm_loadu_ps(test_vec[i].r)[0] (-0x1.308f5c0000000p+9 ~= -0x1.308f5c0000000p+9)
../test/x86/sse.c:4982: assertion failed: r[0] ~= test_vec[i].r[0] (0x1.b80f5c0000000p+7 ~= 0x1.b80f5c0000000p+7)
../test/x86/sse.c:5015: assertion failed: r[0] ~= test_vec[i].r[0] (0x1.8c00000000000p+9 ~= 0x1.8c00000000000p+9)
../test/x86/sse.c:5048: assertion failed: r[0] ~= test_vec[i].r[0] (0x1.62fc280000000p+9 ~= 0x1.62fc280000000p+9)
https://ci.appveyor.com/project/nemequ/simde/build/job/uhkyqxm6fogyagv1?fullLog=true
This is super weird. Is it possible that the error somehow happens at the compare, but is fine by the time data is printed?
Also, just to note, this wouldn't be the first time I've encountered nasty generated assembly bugs from MSVC. To give an example, I've even documented one involving AVX here: https://github.com/Epixu/msvc_but_repro
I guess our last beacon of hope is to compare some healthy ASM with what the compiler generates...
I guess our last beacon of hope is to compare some healthy ASM with what the compiler generates...
Agreed, but that it outside of my abilities and time availability :-(
I will investigate more when I too get the time, Still, thanks a lot for narrowing it down
This is super weird. Is it possible that the error somehow happens at the compare, but is fine by the time data is printed?
In https://github.com/simd-everywhere/simde/commit/fc52a85544d005b4c2973fdeba1e23b932aea343 I tried crafting the test data for a single test differently, but I got the same error: ./test/x86/sse.c:2095: assertion failed: r[0] ~= simde_mm_loadu_ps(test_vec[i].r)[0] (-0x1.a200000000000p+9 ~= -0x1.a200000000000p+9)
Interestingly, the test SSE failures only are with the "native" code paths, but not the emulated one. The SSE2 failing tests are for both (this is for a build without /arch:SSE
or similar)
https://ci.appveyor.com/project/nemequ/simde/build/job/u3cn62vsdvvua23o?fullLog=true
I will investigate more when I too get the time
Thanks!
Still, thanks a lot for narrowing it down
You are welcome! I think there are several fixes from this branch worth merging already, so I'm glad for that.
One more question for you: are these issues a regression from a previous SIMDe release / commit? Or have they always been present from what you have seen?
Honestly, I have no idea. It seems that the issues emerged in later version after introduction of more features, and the only reason the older version still works with me, is that the features are simply not available. So probably a regression. I can't really be sure, though.
I still stick to one particular commit (5e7c4d4ec10855ebe8f19af8a80fdfa26caad5e9), because it passes all my tests and lets me continue work on more important stuff
A bit of an update:
- I dropped SSE1 support in my project, so half of this issue is irrelevant now (for me at least)
- I investigated the
pow
precision problem. It seems that the MSVC/fp
option has huge effect in exactly those places./fp:precise
gives the best results (still wrong, but best in terms of least errors), while any other option screws completely with the tests in those exact places. Considering, thatpow
functions always involve loops and accumulation of errors, I think we can narrow it down to something involving the/fp
option. MSVC hasfloat_control
pragmas, that could be used to experiment with on MSVC compilers to work around the issue, probably.
Thank you for the update @Epixu !
Since the old logs are missing, can you remind me which pow
related test are failing on 32-bit MSVC in your setup?
Just tests that use simde_math_pow{,f}
or something more complicated?
I still stick to one particular commit (5e7c4d4), because it passes all my tests and lets me continue work on more important stuff
One thing that jumps out at me from looking at the changes since this commit:
The use of pow()
instead of simde_math_pow()
in simde/arm/neon/cvt_n.h
: https://github.com/simd-everywhere/simde/commit/2eedece7a#diff-53bfc402d4ade1eaf3147f23394e47554f52396a953fc086ffc731fa3b2d481aR91
@Epixu Maybe https://github.com/simd-everywhere/simde/pull/1192 will help ?
Here are the failing tests, built with the current SIMDe master, minus the SSE1 builds: https://github.com/Langulus/SIMD/actions/runs/9920771659
The failing builds are:
- windows-latest-v143-Release-x86-NoSIMD (uses just std::pow)
- windows-latest-v143-Release-x86-AVX
- windows-latest-v143-Release-x86-AVX2
The AVX2 build ends up executing svml_dpo4_avx2.asm, which looks like this:
--- D:\a\_work\1\s\src\vctools\crt\helperlibs\vmath\src\i386\svml_dpow4_avx2.asm
00F8C1E0 push ebp
00F8C1E1 mov ebp,esp
00F8C1E3 and esp,0FFFFFFC0h
00F8C1E6 sub esp,100h
00F8C1EC vmovdqa ymm3,ymm0
00F8C1F0 vandpd ymm4,ymm3,ymmword ptr ds:[12F1A80h]
00F8C1F8 vorpd ymm2,ymm4,ymmword ptr ds:[12F1AC0h]
00F8C200 vcvtpd2ps xmm5,ymm2
00F8C204 vmovupd ymm7,ymmword ptr ds:[12F1B00h]
00F8C20C vmovupd ymmword ptr [esp+20h],ymm1
00F8C212 vmovupd ymmword ptr [esp],ymm3
00F8C217 vrcpps xmm6,xmm5
00F8C21B vpsrlq ymm1,ymm3,34h
00F8C220 vcvtps2pd ymm0,xmm6
00F8C224 vpaddq ymm0,ymm0,ymmword ptr ds:[12F1BC0h]
00F8C22C vorpd ymm4,ymm1,ymm7
00F8C230 vsubpd ymm5,ymm4,ymm7
00F8C234 vcmplt_oqpd ymm4,ymm3,ymmword ptr ds:[12F1B40h]
00F8C23D vpand ymm7,ymm0,ymmword ptr ds:[12F1C00h]
00F8C245 vmovupd ymmword ptr [esp+40h],ymm5
00F8C24B vcmpnle_uqpd ymm5,ymm3,ymmword ptr ds:[12F1B80h]
00F8C254 vpsrlq ymm3,ymm7,27h
00F8C259 vfmsub213pd ymm2,ymm7,ymmword ptr ds:[12F1AC0h]
00F8C262 vmovupd ymm7,ymmword ptr ds:[12F1C40h]
00F8C26A vorpd ymm4,ymm4,ymm5
00F8C26E vmovupd ymm5,ymmword ptr ds:[12F1CC0h]
00F8C276 vfmadd213pd ymm7,ymm2,ymmword ptr ds:[12F1C80h]
00F8C27F vmovupd ymmword ptr [esp+60h],ymm4
00F8C285 vfmadd213pd ymm5,ymm2,ymmword ptr ds:[12F1D00h]
00F8C28E vextracti128 xmm6,ymm3,1
00F8C294 vmovd eax,xmm3
00F8C298 vmovd edx,xmm6
00F8C29C vmovupd xmm1,xmmword ptr [eax+0AF3240h]
00F8C2A4 vpextrd ecx,xmm3,2
00F8C2AA vpextrd eax,xmm6,2
00F8C2B0 vmovupd xmm0,xmmword ptr [edx+0AF3240h]
00F8C2B8 vmovhpd xmm3,xmm1,qword ptr [ecx+0AF3240h]
00F8C2C0 vunpckhpd xmm1,xmm1,xmmword ptr [ecx+0AF3240h]
00F8C2C8 vmovhpd xmm6,xmm0,qword ptr [eax+0AF3240h]
00F8C2D0 vunpckhpd xmm0,xmm0,xmmword ptr [eax+0AF3240h]
00F8C2D8 vinsertf128 ymm6,ymm3,xmm6,1
00F8C2DE vinsertf128 ymm3,ymm1,xmm0,1
00F8C2E4 vmulpd ymm1,ymm2,ymm2
00F8C2E8 vfmadd231pd ymm3,ymm2,ymmword ptr ds:[12F1D40h]
00F8C2F1 vaddpd ymm4,ymm6,ymmword ptr [esp+40h]
00F8C2F7 vfmadd213pd ymm7,ymm1,ymm5
00F8C2FC vmovupd ymm5,ymmword ptr ds:[12F1D80h]
00F8C304 vmovupd ymm6,ymmword ptr [esp+20h]
00F8C30A vfmadd213pd ymm7,ymm2,ymmword ptr ds:[12F1DC0h]
00F8C313 vmovdqa ymm0,ymm5
00F8C317 vfmadd213pd ymm0,ymm2,ymm4
00F8C31C vmovupd ymmword ptr [esp+80h],ymm7
00F8C325 vsubpd ymm4,ymm0,ymm4
00F8C329 vmulpd ymm7,ymm6,ymm0
00F8C32D vfmsub231pd ymm4,ymm2,ymm5
00F8C332 vmovupd ymm5,ymmword ptr ds:[12F1E80h]
00F8C33A vfmsub213pd ymm0,ymm6,ymm7
00F8C33F vaddpd ymm4,ymm3,ymm4
00F8C343 vandpd ymm2,ymm7,ymmword ptr ds:[12F1E00h]
00F8C34B vcmpnle_uqpd ymm2,ymm2,ymmword ptr ds:[12F1E40h]
00F8C354 vfmadd132pd ymm1,ymm4,ymmword ptr [esp+80h]
00F8C35E vorpd ymm3,ymm2,ymmword ptr [esp+60h]
00F8C364 vmovmskpd eax,ymm3
00F8C368 vmovapd ymm3,ymm6
00F8C36C test eax,eax
00F8C36E vfmadd213pd ymm3,ymm1,ymm7
00F8C373 vsubpd ymm4,ymm3,ymm7
00F8C377 vaddpd ymm2,ymm5,ymm3
00F8C37B vfmsub213pd ymm1,ymm6,ymm4
00F8C380 vsubpd ymm7,ymm2,ymm5
00F8C384 vpsllq ymm2,ymm2,3
00F8C389 vaddpd ymm0,ymm0,ymm1
00F8C38D vsubpd ymm7,ymm3,ymm7
00F8C391 vmovupd ymm1,ymmword ptr ds:[12F1F00h]
00F8C399 vaddpd ymm3,ymm7,ymm0
00F8C39D vfmadd213pd ymm1,ymm7,ymmword ptr ds:[12F1F40h]
00F8C3A6 vandpd ymm4,ymm2,ymmword ptr ds:[12F1EC0h]
00F8C3AE vmovupd ymmword ptr [esp+0A0h],ymm4
00F8C3B7 vfmadd213pd ymm1,ymm3,ymmword ptr ds:[12F1F80h]
00F8C3C0 vfmadd213pd ymm1,ymm3,ymmword ptr ds:[12F1FC0h]
00F8C3C9 vxorpd ymm2,ymm2,ymmword ptr [esp+0A0h]
00F8C3D2 vmovd edx,xmm4
00F8C3D6 vpextrd ecx,xmm4,2
00F8C3DC vextracti128 xmm4,ymm4,1
00F8C3E2 vmovsd xmm6,qword ptr [edx+12F1280h]
00F8C3EA vmovd edx,xmm4
00F8C3EE vmovhpd xmm5,xmm6,qword ptr [ecx+12F1280h]
00F8C3F6 vpextrd ecx,xmm4,2
00F8C3FC vmovsd xmm6,qword ptr [edx+12F1280h]
00F8C404 vmovhpd xmm4,xmm6,qword ptr [ecx+12F1280h]
00F8C40C vinsertf128 ymm4,ymm5,xmm4,1
00F8C412 vmulpd ymm0,ymm4,ymm3
00F8C416 vpsllq ymm5,ymm2,29h
00F8C41B vfmadd213pd ymm1,ymm0,ymm4
00F8C420 vmulpd ymm0,ymm5,ymm1
00F8C424 jne ___avx2_pow4@@64+24Eh (0F8C42Eh)
00F8C42A mov esp,ebp
00F8C42C pop ebp
00F8C42D ret
Okay, so the function that is failing on 32-bit MSVC non-sse2 is https://github.com/Langulus/SIMD/blob/a7cf2a9799dbc55b5aec6d9b9f20b1bfbaa187e3/source/Pow.hpp#L33 , yes?
The wip
branch file is more up-to-date - I forgot to mention, that the CI ran on that branch:
https://github.com/Langulus/SIMD/blob/wip/source/binary/Pow.hpp
The non-simd version just falls back to std::pow for real numbers.
The failing AVX2 case should be that one.
The failing AVX2 case should be that one.
Ah, a SVML function. Okay, then this is probably the source of your problem:
https://github.com/simd-everywhere/simde/commit/593af95cd9f43a96f66a582a9ae32dfe5dfd8eaa#diff-b2776e35f1b43a42e8f31d4d30815dd8d2fd869b0c6dc2ef85f1de31ab927e4dR296-R297
I bet disabling SVML on 32-bit MSVC x86 fixes things for you. Try adding -DSIMDE_X86_SVML_NO_NATIVE
on your builds to confirm?
I bet disabling SVML on 32-bit MSVC x86 fixes things for you. Try adding
-DSIMDE_X86_SVML_NO_NATIVE
on your builds to confirm?
Will try it now
@Epixu If that works, we will need to decide between not automatically enabling native SVML with 32-bit MSVC, or only selectively skipping it for certain functions. It would be good if you reported this to MS and we got a bug number so that we can disable the workaround for unaffected (future) versions of their compiler
@Epixu If that works, we will need to decide between not automatically enabling native SVML with 32-bit MSVC, or only selectively skipping it for certain functions. It would be good if you reported this to MS and we got a bug number so that we can disable the workaround for unaffected (future) versions of their compiler
What should I write in the bug? That their SVML implementation is giving wrong results, probably a rounding error somewhere?
Also, why is my NoSIMD build affected and test passing, too? That one seems to be a problem on my side. I've probably not disabled SIMD entirely on that build, so it gets compiled down to svml_dpo4_avx2.asm
again?
What should I write in the bug? That their SVML implementation is giving wrong results, probably a rounding error somewhere?
Yes. Needs fixing or they can clarify in their documentation the expected errors/variation based upon architecture / CPU feature
Also, why is my NoSIMD build affected and test passing, too? That one seems to be a problem on my side. I've probably not disabled SIMD entirely on that build, so it gets compiled down to
svml_dpo4_avx2.asm
again?
Looking at https://github.com/Langulus/SIMD/blob/a7cf2a9799dbc55b5aec6d9b9f20b1bfbaa187e3/.github/workflows/ci.yml#L17 and https://learn.microsoft.com/en-us/cpp/build/reference/arch-x86?view=msvc-170
Then we see that no /arch
means /arch:SSE2
on x86 systems, which is confusing as your explicit /arch:SSE2
build passed! Maybe try /arch:IA32
for your NoSIMD
build?
I missed that on my first read-through. Huzzah!
So, do you think we should:
A. disable SVML auto-on for MSVC 32-bit entirely (but still can be forced on with -DSIMDE_X86_SVML_NATIVE
)
or
B. disable SVML native + MSVC 32-bit only for certain functions (with no way to force it on, unless we invent some new convention)?
A. disable SVML auto-on for MSVC 32-bit entirely (but still can be forced on with
-DSIMDE_X86_SVML_NATIVE
)
Can't really say for sure - my project's scope is considerably smaller than yours. But considering that probably a niche selection of people actually attempt these builds these days, I suppose disabling SVML native for x86 MSVC builds wouldn't be that bad in the grand scheme of things. It is more likely that it will save you a couple of issue reports in the long run, and unlikely to cause any real harm, but that's just my subjective opinion.
B. disable SVML native + MSVC 32-bit only for certain functions (with no way to force it on, unless we invent some new convention)?
That seems to be an overkill and would probably involve a lot of investigation of which functions are actually botched. My tests aren't exhaustive. Don't know about yours. The first option seems to save you the trouble outright. At least for now...