sse2neon icon indicating copy to clipboard operation
sse2neon copied to clipboard

Use unaligned data types for unaligned intrinsics.

Open Logikable opened this issue 1 year ago • 7 comments

Logikable avatar May 14 '24 21:05 Logikable

some fix still required

howjmay avatar May 14 '24 22:05 howjmay

How do I debug the Armv7 issues? I don't have the hardware to test this locally.

Logikable avatar May 16 '24 17:05 Logikable

How do I debug the Armv7 issues? I don't have the hardware to test this locally.

You can emulate Armv7 targets via QEMU. Check https://dev.to/amarjargal/running-debian-on-an-emulated-arm-machine-2i04 and specify armhf target.

jserv avatar May 16 '24 17:05 jserv

I think I found the issue. In _mm_loadu_si64, there is a call to vld1_s64, which looks like this:

typedef __attribute__((neon_vector_type(1))) int64_t int64x1_t;
int64x1_t vld1_s64 (const int64_t * __a)
{
  return (int64x1_t) { *__a };
}

On 32-bit Arm, this is done using ldrd, and ldrd doesn't support unaligned accesses. This isn't a problem on 64-bit because it uses ldr.

Is there an unaligned version of vld1_s64? If not, what do you suggest we do?

Logikable avatar May 16 '24 21:05 Logikable

Thanks for the patch. This should fix some -fsanitize=alignment (part of -fsanitize=undefined) uses.

MaskRay avatar May 16 '24 21:05 MaskRay

ARMv6 and Armv7 CPUs can perform unaligned accesses for most single load and store instructions up to word size. However, LDM, STM, LDRD, and STRD instructions still need to be handled separately for unaligned accesses. 64-bit variables are typically accessed using LDRD/STRD, which require 32-bit alignment. To handle unaligned 64-bit accesses, we can use a struct-based implementation, which the compiler is smart enough to handle using multiple 32-bit accesses.

Reference: Memory alignment issue

jserv avatar May 17 '24 02:05 jserv

I found another way around this issue. _mm_loadu_si64 is written differently from _mm_loadu_si{16,32,128} -- specifically, the latter is written without the use of the problematic vld1_s*. I rewrote _mm_loadu_si64 to be symmetrical.

Logikable avatar May 17 '24 18:05 Logikable

Thank @Logikable for contributing!

jserv avatar May 20 '24 21:05 jserv

Is __attribute__((aligned(x))) being used incorrectly?

"Cannot decrease the alignment below the natural alignment of the type." "For a variable that is not in a structure, the minimum alignment is the natural alignment of the variable type."

https://developer.arm.com/documentation/101754/0622/armclang-Reference/Compiler-specific-Function--Variable--and-Type-Attributes/--attribute----aligned---variable-attribute

aqrit avatar May 21 '24 21:05 aqrit

That only applies to structs/struct members, and the alignment can still be decreased in that situation by also specifying packed.

https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Common-Variable-Attributes.html#Common-Variable-Attributes

Logikable avatar May 21 '24 21:05 Logikable