Wrong alignment of UME::SIMD::SIMDVec<short, 16u> using AVX2
The code below prints addresses that are not properly aligned:
#include <cstdio>
#include <memory>
#include <UMESimd.h>
bool is_aligned(void* ptr, size_t alignment)
{
return ((uintptr_t)ptr) % alignment == 0;
}
int main()
{
UME::SIMD::SIMDVec<short, 16u> v;
printf(" sizeof(v) = %lu\n", sizeof(v));
printf(" addressof(v) = %p\n", std::addressof(v));
printf(" addressof(v) %% 32 = %lu\n", (uintptr_t)std::addressof(v) % 32);
printf(" is_aligned(v) = %s\n", is_aligned(std::addressof(v), 32) ? "true" : "false");
return 0;
}
Sample session on my machine:
pollux umesimd $ g++ -Wall -std=c++11 -mavx2 -I. -o test test.cc
pollux umesimd $ ./test
sizeof(v) = 32
addressof(v) = 0x7ffc90eab290
addressof(v) % 32 = 16
is_aligned(v) = false
pollux umesimd $ ./test
sizeof(v) = 32
addressof(v) = 0x7fffffb0cf90
addressof(v) % 32 = 16
is_aligned(v) = false
pollux umesimd $ ./test
sizeof(v) = 32
addressof(v) = 0x7fffbbd10340
addressof(v) % 32 = 0
is_aligned(v) = true
Note: the same problem is true for unsigned short as well. The other types seem to be fine.
Upon further testing, UME::SIMD::SIMDVec<unsigned long, 4u> is also affected by this.
The explanation seems very simple: you cannot assume anything about how the SIMDVec type is implemented. In the specific cases, the implementation is using an array of 16 uint16_t elements. This evaluates to 'sizeof(v) = 32', however the 'alignof(v) = 2' due to 'sizeof(uint16)t = 2'.
For SIMDVec<uint32_t, 16> the underlying representation is a pair of YMM registers, which has a stonger alignment requirement (32B). This evaluates then to 'sizeof(SIMDVec<uint32_t, 16>) = 64' (2*sizeof(__m256i)), but alignof(SIMDVec<uint32_t, 16> = 32' because 'sizeof(__m256i) = 32'.
Because C++ standard leaves it up to the implementation to decide what is the specific alignment requirement for scalar types, and so we do with SIMD types. UME::SIMD still relies on some compiler optimizations, and in this case using alignment '2' is a fair choice that could be probably rationalized by heuristics.
As for the programming 'correctness': you should use 'alignof' operator to 'query' the minimum requirement of a vector, and then 'alignas' to direct additional requirements of the alignment. You cannot assume anything about the alignment requirements, as these might vary between platforms.
#include <cstdio>
#include <memory>
#include <UMESimd.h>
bool is_aligned(void* ptr, size_t alignment)
{
return ((uintptr_t)ptr) % alignment == 0;
}
int main()
{
alignas(32) UME::SIMD::SIMDVec<uint16_t, 16u> v;
printf(" alignof(uint16_t) = %lu\n", alignof(uint16_t));
printf(" alignof(v) = %lu\n", alignof(v));
printf(" sizeof(v) = %lu\n", sizeof(v));
printf(" addressof(v) = %p\n", std::addressof(v));
printf(" addressof(v) %% 32 = %lu\n", (uintptr_t)std::addressof(v) % 32);
printf(" is_aligned(v) = %s\n", is_aligned(std::addressof(v), 32) ? "true" : "false");
return 0;
}
Which results in:
alignof(uint16_t) = 2
alignof(v) = 32
sizeof(v) = 32
addressof(v) = 0x7ffefb804700
addressof(v) % 32 = 0
is_aligned(v) = true
The point here is that if the alignment is different than what the hardware expects (i.e. 32 for AVX/AVX2, regardless of vector length), the compiler cannot vectorize in many cases. So if you are expecting the compiler to optimize, it wil likely fail. The other problem is that creating containers of your type may cause a segmentation violation if the alignment is not correct, which is the problem I am trying to tackle at the moment.
Also note that alignof(v) for your v is shown as 32, not 2, as you said.
alignof shows 32 because of 'alignas' used on declaration! Without it, the compiler can do whatever it wants. I did this to show, that you, as a user have the control;]
Can you give me an example of code that causes such violation? If you are trying to cast from raw pointers of fundamental types then this is not an intended model of use.