simde
simde copied to clipboard
hmmer + simde 0.5 on arm64 (and others): SSE related tests fail
test: https://github.com/EddyRivasLab/hmmer/blob/bcd05b05c73f5c89583d4385cd692df190edc577/src/impl_sse/msvfilter.c#L615
primary function: https://github.com/EddyRivasLab/hmmer/blob/bcd05b05c73f5c89583d4385cd692df190edc577/src/impl_sse/msvfilter.c#L74
exercise 26 result:
~/hmmer/testsuite$ ../src/impl/msvfilter_utest -v
MSVFilter() tests, DNA
msv filter unit test failed: scores differ (-21.25, -10.86)
Debian patches: https://salsa.debian.org/med-team/hmmer/-/tree/f1a7c6e88ad50fbe22d84e8724cddaf584c76623/debian/patches
I'll try to get to this soon, but assuming this works on x86 with -DSIMDE_NO_NATIVE
msvfilter.c:85-86 is pretty suspicious; it open up the code to aliasing issues, and indeed at :141 and :151 there are aliasing violations. Changing those to
mpv = _mm_slli_si128(_mm_load_si128(dp[Q-1]), 1);
and
mpv = _mm_load_si128(dp[q]);
Might do the trick. I'm assuming the data is aligned, otherwise _mm_loadu_si128
may be enough.
but assuming this works on x86 with
-DSIMDE_NO_NATIVE
To compile with -DSIMDE_NO_NATIVE
on x86 I had to add the following patch to src/impl_sse/impl_sse.h
@@ -355,7 +356,7 @@
static inline void
impl_Init(void)
{
-#ifdef HAVE_FLUSH_ZERO_MODE
+#if defined(HAVE_FLUSH_ZERO_MODE) && defined(SIMDE_X86_SSE_NATIVE)
/* In order to avoid the performance penalty dealing with sub-normal
* values in the floating point calculations, set the processor flag
* so sub-normals are "flushed" immediately to zero.
When compiling your suggested edits for arm64
aarch64-linux-gnu-gcc -g -O2 -fdebug-prefix-map=/build/hmmer-3.3+dfsg2=. -fstack-protector-strong -Wformat -Werror=format-security -fopenmp-simd -DSIMDE_ENABLE_OPENMP -O3 -flto -O3 -pthread -Wdate-time -D_FORTIFY_SOURCE=2 -DHAVE_CONFIG_H -I../../easel -I../../easel -I. -I.. -I. -I../../src -I./.. -o msvfilter.o -c msvfilter.c
In file included from msvfilter.c:24:
msvfilter.c: In function 'p7_MSVFilter':
msvfilter.c:141:45: error: incompatible type for argument 1 of 'simde_mm_load_si128'
141 | mpv = _mm_slli_si128(_mm_load_si128(dp[Q-1]), 1);
| ~~^~~~~
| |
| __m128i
In file included from msvfilter.c:24:
/usr/include/simde/x86/sse2.h:2857:42: note: expected 'const __vector(2) long int *' but argument is of type '__m128i'
2857 | simde_mm_load_si128 (simde__m128i const* mem_addr) {
| ~~~~~~~~~~~~~~~~~~~~^~~~~~~~
msvfilter.c:141:45: error: incompatible type for argument 1 of 'simde_mm_load_si128'
141 | mpv = _mm_slli_si128(_mm_load_si128(dp[Q-1]), 1);
| ~~^~~~~
| |
| __m128i
/usr/include/simde/x86/sse2.h:2857:42: note: expected 'const __vector(2) long int *' but argument is of type '__m128i'
2857 | simde_mm_load_si128 (simde__m128i const* mem_addr) {
| ~~~~~~~~~~~~~~~~~~~~^~~~~~~~
msvfilter.c:150:34: error: incompatible type for argument 1 of 'simde_mm_load_si128'
150 | mpv = _mm_load_si128(dp[q]); /* Load {MDI}(i-1,q) into mpv */
| ~~^~~
| |
| __m128i
/usr/include/simde/x86/sse2.h:2857:42: note: expected 'const __vector(2) long int *' but argument is of type '__m128i'
2857 | simde_mm_load_si128 (simde__m128i const* mem_addr) {
| ~~~~~~~~~~~~~~~~~~~~^~~~~~~~
FYI, this is using the v0.5.0
release of SIMDe
To compile with -DSIMDE_NO_NATIVE on x86 I had to add the following patch to src/impl_sse/impl_sse.h
Hm, that looks like an autoconf macro… Yep.
SIMDe doesn't include that macro (the IIG was missing a lot of macros when I implemented SSE). I'll try to add the macro (though I'm not sure it will actually do anything), but the test is including the x86 header directly. I'm not sure why they're actually using an autoconf test instead of just checking for the _MM_SET_FLUSH_ZERO_MODE
macro in the code…
When compiling your suggested edits for arm64
Oops, sorry, those should be
mpv = _mm_slli_si128(_mm_load_si128(&(dp[Q-1])), 1);
and
mpv = _mm_load_si128(&(dp[q]));
@nemequ Thanks, that compiles on arm64, but the same test fails in the same way
/build/hmmer-3.3+dfsg2/testsuite# ../src/impl/msvfilter_utest -v
MSVFilter() tests, DNA
msv filter unit test failed: scores differ (-21.25, -10.86)
Re-tested with SIMDe 0.7.0, same test failure