stdarch
stdarch copied to clipboard
p64 load/store intrinsics not properly inlined on arm
The following tests fail the inlining check on arm (but pass on aarch64):
core_arch::arm_shared::neon::generated::assert_vld1_p64_x2_vld1
core_arch::arm_shared::neon::generated::assert_vld1_p64_x3_nop
core_arch::arm_shared::neon::generated::assert_vld1_p64_x4_nop
core_arch::arm_shared::neon::generated::assert_vld1q_p64_x2_nop
core_arch::arm_shared::neon::generated::assert_vld1q_p64_x3_nop
core_arch::arm_shared::neon::generated::assert_vld1q_p64_x4_nop
core_arch::arm_shared::neon::generated::assert_vld2_dup_p64_nop
core_arch::arm_shared::neon::generated::assert_vld2_p64_nop
core_arch::arm_shared::neon::generated::assert_vld3_dup_p64_nop
core_arch::arm_shared::neon::generated::assert_vld3_p64_nop
core_arch::arm_shared::neon::generated::assert_vld4_dup_p64_nop
core_arch::arm_shared::neon::generated::assert_vld4_p64_nop
core_arch::arm_shared::neon::generated::assert_vst1_p64_x2_vst1
core_arch::arm_shared::neon::generated::assert_vst1_p64_x3_nop
core_arch::arm_shared::neon::generated::assert_vst1_p64_x4_nop
core_arch::arm_shared::neon::generated::assert_vst1q_p64_x2_nop
core_arch::arm_shared::neon::generated::assert_vst1q_p64_x3_nop
core_arch::arm_shared::neon::generated::assert_vst1q_p64_x4_nop
core_arch::arm_shared::neon::generated::assert_vst2_p64_nop
core_arch::arm_shared::neon::generated::assert_vst3_p64_nop
core_arch::arm_shared::neon::generated::assert_vst4_p64_nop
Apparently we need an extra version of the load/store intrinsics we delegate those intrinsics to with matching feature flags, see https://godbolt.org/z/WxozeEPav
It looks like this is because the Rust compiler doesn't support inlining a function labeled with some target_feature(enable = "foo") into another function labeled with another set of enabled features (unless all the features in foo are enabled in compiler flags).
I've investigated this problem in this blog post, in particular this section on target_feature and this section on what it means for recursion. See also similar issues in https://github.com/rust-lang/rust/issues/54353 and https://github.com/rust-lang/rust/issues/53069.
Concretely, the problem is that you're trying to inline a function labeled neon,v7 within a function labeled neon,aes,v8. If I compile the first code with -C target-feature=+aes,+v7,+v8, then the intrinsic is inlined.
#[inline] // This doesn't apply to a function labeled with other features.
#[target_feature(enable = "neon,v7")]
pub unsafe fn vld2_s64(a: *const i64) -> int64x1x2_t {
...
}
#[inline]
#[target_feature(enable = "neon,aes,v8"))]
pub unsafe fn vld2_p64(a: *const p64) -> poly64x1x2_t {
transmute(vld2_s64(transmute(a)))
}
#[target_feature(enable = "neon,aes,v8"))]
#[inline(never)]
pub unsafe fn vld2_p64_testshim(ptr: *const p64) -> poly64x1x2_t {
vld2_p64(ptr)
}