acle
acle copied to clipboard
[proposal] Add vector intrinsics for loading into lane 0 and setting other lanes to 0
It would be useful to have vector intrinsics that load lane 0 from memory and set the other elements to zero. E.g.:
-
int8x16_t vfoo_s8(const int8_t *)
→LDR Bn, [Xn]
-
int16x8_t vfoo_s16(const int16_t *)
→LDR Hn, [Xn]
- ….
The same thing would work for SVE.
GCC does at least optimise something like:
#include <arm_neon.h>
float32x2_t f(float32_t *ptr)
{
float32x2_t vec = {};
vec = vld1_lane_f32(ptr, vec, 0);
vec = vld1_lane_f32(ptr + 2, vec, 1);
return vec;
}
to:
ldr s0, [x0], 8
ld1 {v0.s}[1], [x0]
ret
and LLVM behaves similarly, but that seems a bit indirect.
Hi, thanks for your issue report. If possible, we encourage you to contribute with a Pull Request that addresses this issue. We will be happy to review it.