arm neon vld1*x tests fail on arm hw due to misaligned pointers
vld1x intrinsics are compiled to have alignment requirements, e.g. the assembly emitted for:
pub unsafe fn vld1_f32_x2(a: *const f32) -> float32x2x2_t { ... }
is
<stdarch_test_shim_vld1_f32_x2_vld1>:
f4200a9f vld1.32 {d0-d1}, [r0 :64]
e12fff1e bx lr
and requires ato be 64-bit aligned (denoted by the :64 in the assembly). Clang does the same AFAICS.
Two problems:
- The alignment requirements are neither documented in Rust nor obvious and aarch64 has no such requirements.
- Our tests do not pass properly aligned pointers. Execution of the test suite on real arm hardware causes bus errors.
Possible solutions:
- Relax the alignment requirements at a slight general performance loss. Haven't checked if this can be done easily.
- Document and maybe assert the requirements and fix the tests.
cc @SparrowLii
The alignment should be that of f32. This is what clang does for the same intrinsic.
The alignment should be that of
f32. This is what clang does for the same intrinsic.
In my test clang compiles the load with 64-bit alignment as well (vld1.32 {d16, d17}, [r1:64]): Godbolt
As we are directly calling the LLVM intrinsics here without any possibility to specify alignment solution 2) seems to be the only way.
As we are directly calling the LLVM intrinsics here without any possibility to specify alignment solution 2) seems to be the only way.
I agree. We can describe 1) in the document and try to find a better way in the future
I don't know if the LLVM behavior is intended. For vld1_f32_x4 they require 256-bit alignment (clang Godbolt).
GCC does not even have the vld1_*_x* intrinsics for armv7 (issue), so we can't compare.
Fixing the test generation is a bit of a pain, but having a failing CI on native arm hw is not a good idea. Qemu will supposedly check for unaligned access in a recent or future version (Stackoverflow answer), then the GH CI will fail as well.
Here is another test case: https://rust.godbolt.org/z/qjx6znnq4
Upstream issue: https://github.com/llvm/llvm-project/issues/59081