sse2neon
sse2neon copied to clipboard
Use unaligned data types for unaligned intrinsics.
some fix still required
How do I debug the Armv7 issues? I don't have the hardware to test this locally.
How do I debug the Armv7 issues? I don't have the hardware to test this locally.
You can emulate Armv7 targets via QEMU. Check https://dev.to/amarjargal/running-debian-on-an-emulated-arm-machine-2i04 and specify armhf target.
I think I found the issue. In _mm_loadu_si64, there is a call to vld1_s64, which looks like this:
typedef __attribute__((neon_vector_type(1))) int64_t int64x1_t;
int64x1_t vld1_s64 (const int64_t * __a)
{
return (int64x1_t) { *__a };
}
On 32-bit Arm, this is done using ldrd, and ldrd doesn't support unaligned accesses. This isn't a problem on 64-bit because it uses ldr.
Is there an unaligned version of vld1_s64? If not, what do you suggest we do?
Thanks for the patch. This should fix some -fsanitize=alignment (part of -fsanitize=undefined) uses.
ARMv6 and Armv7 CPUs can perform unaligned accesses for most single load and store instructions up to word size. However, LDM, STM, LDRD, and STRD instructions still need to be handled separately for unaligned accesses. 64-bit variables are typically accessed using LDRD/STRD, which require 32-bit alignment. To handle unaligned 64-bit accesses, we can use a struct-based implementation, which the compiler is smart enough to handle using multiple 32-bit accesses.
Reference: Memory alignment issue
I found another way around this issue. _mm_loadu_si64 is written differently from _mm_loadu_si{16,32,128} -- specifically, the latter is written without the use of the problematic vld1_s*. I rewrote _mm_loadu_si64 to be symmetrical.
Thank @Logikable for contributing!
Is __attribute__((aligned(x))) being used incorrectly?
"Cannot decrease the alignment below the natural alignment of the type." "For a variable that is not in a structure, the minimum alignment is the natural alignment of the variable type."
https://developer.arm.com/documentation/101754/0622/armclang-Reference/Compiler-specific-Function--Variable--and-Type-Attributes/--attribute----aligned---variable-attribute
That only applies to structs/struct members, and the alignment can still be decreased in that situation by also specifying packed.
https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Common-Variable-Attributes.html#Common-Variable-Attributes