AVX512 masked load and store functions (simde_mm512_mask_{loadu,storeu}_*) are implemented incorrectly
The masked functions should not touch memory where the mask is set to 0 at all.
Instead, the simde_mm512_mask_storeu_* functions write a 0 if the mask is set to 0. Take for example the following code:
#include <iostream>
#include "simde/x86/avx512.h"
int main() {
double array[8] = { 1, 2, 3, 4, 5, 6, 7, 8};
simde_mm512_mask_storeu_pd(array, 0, simde_mm512_setzero_pd());
for (auto d : array) {
std::cout << d << " ";
}
std::cout << std::endl;
return 0;
}
Using SIMDe:
> g++ -o test -mno-avx512f test.cpp
> ./test
0 0 0 0 0 0 0 0
Using native AVX512
> g++ -o test -mavx512f test.cpp
> ./test
1 2 3 4 5 6 7 8
For the simde_mm512_mask_loadu_* the issue isn't as severe since they do correctly keep the old values in the target register, however they still load all values from the source into a temporary register which could cause segmentation faults. Take for example the following code:
#include <iostream>
#include "simde/x86/avx512.h"
int main() {
simde__m512d a = simde_mm512_setzero_pd();
simde_mm512_mask_loadu_pd(a, 0, nullptr);
for (int i = 0; i < 8; ++i) {
std::cout << a[i] << " ";
}
std::cout << std::endl;
return 0;
}
Using SIMDe:
> g++ -o test -mno-avx512f test.cpp
> ./test
Segmentation fault (core dumped)
Using native AVX512:
> g++ -o test -mavx512f test.cpp
> ./test
0 0 0 0 0 0 0 0
@Jakob-en Thank you for your report, this makes sense and I see how we missed this in testing (we initialized to all zeros and also some functions aren't tested).
Are you able to contribute a fix?
This functionality is important to me and I hope to be contributing a fix shortly for it.
Unfortunately, other than scalar fallback I don't see a good general compat solution here. The avx512 masked load/store instructions are guarenteed to neither touch memory or cause a page fault if the respective lane is masked out, which means you can't even fallback to the prior AVX masked ops (which did not guarentee no page fault).