simde
simde copied to clipboard
AVX512 masked load and store functions (simde_mm512_mask_{loadu,storeu}_*) are implemented incorrectly
The masked functions should not touch memory where the mask is set to 0 at all.
Instead, the simde_mm512_mask_storeu_* functions write a 0 if the mask is set to 0. Take for example the following code:
#include <iostream>
#include "simde/x86/avx512.h"
int main() {
double array[8] = { 1, 2, 3, 4, 5, 6, 7, 8};
simde_mm512_mask_storeu_pd(array, 0, simde_mm512_setzero_pd());
for (auto d : array) {
std::cout << d << " ";
}
std::cout << std::endl;
return 0;
}
Using SIMDe:
> g++ -o test -mno-avx512f test.cpp
> ./test
0 0 0 0 0 0 0 0
Using native AVX512
> g++ -o test -mavx512f test.cpp
> ./test
1 2 3 4 5 6 7 8
For the simde_mm512_mask_loadu_* the issue isn't as severe since they do correctly keep the old values in the target register, however they still load all values from the source into a temporary register which could cause segmentation faults. Take for example the following code:
#include <iostream>
#include "simde/x86/avx512.h"
int main() {
simde__m512d a = simde_mm512_setzero_pd();
simde_mm512_mask_loadu_pd(a, 0, nullptr);
for (int i = 0; i < 8; ++i) {
std::cout << a[i] << " ";
}
std::cout << std::endl;
return 0;
}
Using SIMDe:
> g++ -o test -mno-avx512f test.cpp
> ./test
Segmentation fault (core dumped)
Using native AVX512:
> g++ -o test -mavx512f test.cpp
> ./test
0 0 0 0 0 0 0 0