simde icon indicating copy to clipboard operation
simde copied to clipboard

AVX512 masked load and store functions (simde_mm512_mask_{loadu,storeu}_*) are implemented incorrectly

Open JakobEnglhauser opened this issue 1 year ago • 2 comments

The masked functions should not touch memory where the mask is set to 0 at all.

Instead, the simde_mm512_mask_storeu_* functions write a 0 if the mask is set to 0. Take for example the following code:

#include <iostream>
#include "simde/x86/avx512.h"

int main() {
        double array[8] = { 1, 2, 3, 4, 5, 6, 7, 8};

        simde_mm512_mask_storeu_pd(array, 0, simde_mm512_setzero_pd());

        for (auto d : array) {
                std::cout << d << " ";
        }
        std::cout << std::endl;
        return 0;
}

Using SIMDe:

> g++ -o test -mno-avx512f test.cpp
> ./test
0 0 0 0 0 0 0 0

Using native AVX512

> g++ -o test -mavx512f test.cpp
> ./test
1 2 3 4 5 6 7 8

For the simde_mm512_mask_loadu_* the issue isn't as severe since they do correctly keep the old values in the target register, however they still load all values from the source into a temporary register which could cause segmentation faults. Take for example the following code:

#include <iostream>
#include "simde/x86/avx512.h"

int main() {
        simde__m512d a = simde_mm512_setzero_pd();

        simde_mm512_mask_loadu_pd(a, 0, nullptr);

        for (int i = 0; i < 8; ++i) {
                std::cout << a[i] << " ";
        }
        std::cout << std::endl;
        return 0;
}

Using SIMDe:

> g++ -o test -mno-avx512f test.cpp
> ./test
Segmentation fault (core dumped)

Using native AVX512:

> g++ -o test -mavx512f test.cpp
> ./test
0 0 0 0 0 0 0 0

JakobEnglhauser avatar Aug 07 '24 21:08 JakobEnglhauser

@Jakob-en Thank you for your report, this makes sense and I see how we missed this in testing (we initialized to all zeros and also some functions aren't tested).

Are you able to contribute a fix?

mr-c avatar Sep 23 '24 08:09 mr-c

This functionality is important to me and I hope to be contributing a fix shortly for it.

Unfortunately, other than scalar fallback I don't see a good general compat solution here. The avx512 masked load/store instructions are guarenteed to neither touch memory or cause a page fault if the respective lane is masked out, which means you can't even fallback to the prior AVX masked ops (which did not guarentee no page fault).

Remnant44 avatar Jun 02 '25 22:06 Remnant44