stdarch icon indicating copy to clipboard operation
stdarch copied to clipboard

arm neon vld1*x tests fail on arm hw due to misaligned pointers

Open hkratz opened this issue 4 years ago • 7 comments

vld1x intrinsics are compiled to have alignment requirements, e.g. the assembly emitted for:

pub unsafe fn vld1_f32_x2(a: *const f32) -> float32x2x2_t { ... }

is

<stdarch_test_shim_vld1_f32_x2_vld1>:
      f4200a9f        vld1.32 {d0-d1}, [r0 :64]
      e12fff1e        bx      lr

and requires ato be 64-bit aligned (denoted by the :64 in the assembly). Clang does the same AFAICS.

Two problems:

  1. The alignment requirements are neither documented in Rust nor obvious and aarch64 has no such requirements.
  2. Our tests do not pass properly aligned pointers. Execution of the test suite on real arm hardware causes bus errors.

Possible solutions:

  1. Relax the alignment requirements at a slight general performance loss. Haven't checked if this can be done easily.
  2. Document and maybe assert the requirements and fix the tests.

cc @SparrowLii

hkratz avatar Sep 11 '21 19:09 hkratz

The alignment should be that of f32. This is what clang does for the same intrinsic.

Amanieu avatar Sep 12 '21 14:09 Amanieu

The alignment should be that of f32. This is what clang does for the same intrinsic.

In my test clang compiles the load with 64-bit alignment as well (vld1.32 {d16, d17}, [r1:64]): Godbolt

hkratz avatar Sep 12 '21 14:09 hkratz

As we are directly calling the LLVM intrinsics here without any possibility to specify alignment solution 2) seems to be the only way.

hkratz avatar Sep 12 '21 16:09 hkratz

As we are directly calling the LLVM intrinsics here without any possibility to specify alignment solution 2) seems to be the only way.

I agree. We can describe 1) in the document and try to find a better way in the future

SparrowLii avatar Sep 13 '21 01:09 SparrowLii

I don't know if the LLVM behavior is intended. For vld1_f32_x4 they require 256-bit alignment (clang Godbolt).

GCC does not even have the vld1_*_x* intrinsics for armv7 (issue), so we can't compare.

Fixing the test generation is a bit of a pain, but having a failing CI on native arm hw is not a good idea. Qemu will supposedly check for unaligned access in a recent or future version (Stackoverflow answer), then the GH CI will fail as well.

hkratz avatar Sep 13 '21 07:09 hkratz

Here is another test case: https://rust.godbolt.org/z/qjx6znnq4

Nugine avatar Nov 19 '22 05:11 Nugine

Upstream issue: https://github.com/llvm/llvm-project/issues/59081

Amanieu avatar Nov 29 '23 04:11 Amanieu