simde icon indicating copy to clipboard operation
simde copied to clipboard

AVX512 masked load and store functions (simde_mm512_mask_{loadu,storeu}_*) are implemented incorrectly

Open Jakob-en opened this issue 6 months ago • 1 comments

The masked functions should not touch memory where the mask is set to 0 at all.

Instead, the simde_mm512_mask_storeu_* functions write a 0 if the mask is set to 0. Take for example the following code:

#include <iostream>
#include "simde/x86/avx512.h"

int main() {
        double array[8] = { 1, 2, 3, 4, 5, 6, 7, 8};

        simde_mm512_mask_storeu_pd(array, 0, simde_mm512_setzero_pd());

        for (auto d : array) {
                std::cout << d << " ";
        }
        std::cout << std::endl;
        return 0;
}

Using SIMDe:

> g++ -o test -mno-avx512f test.cpp
> ./test
0 0 0 0 0 0 0 0

Using native AVX512

> g++ -o test -mavx512f test.cpp
> ./test
1 2 3 4 5 6 7 8

For the simde_mm512_mask_loadu_* the issue isn't as severe since they do correctly keep the old values in the target register, however they still load all values from the source into a temporary register which could cause segmentation faults. Take for example the following code:

#include <iostream>
#include "simde/x86/avx512.h"

int main() {
        simde__m512d a = simde_mm512_setzero_pd();

        simde_mm512_mask_loadu_pd(a, 0, nullptr);

        for (int i = 0; i < 8; ++i) {
                std::cout << a[i] << " ";
        }
        std::cout << std::endl;
        return 0;
}

Using SIMDe:

> g++ -o test -mno-avx512f test.cpp
> ./test
Segmentation fault (core dumped)

Using native AVX512:

> g++ -o test -mavx512f test.cpp
> ./test
0 0 0 0 0 0 0 0

Jakob-en avatar Aug 07 '24 21:08 Jakob-en