acle icon indicating copy to clipboard operation
acle copied to clipboard

[proposal] Add vector intrinsics for loading into lane 0 and setting other lanes to 0

Open rsandifo-arm opened this issue 1 year ago • 1 comments

It would be useful to have vector intrinsics that load lane 0 from memory and set the other elements to zero. E.g.:

  • int8x16_t vfoo_s8(const int8_t *)LDR Bn, [Xn]
  • int16x8_t vfoo_s16(const int16_t *)LDR Hn, [Xn]
  • ….

The same thing would work for SVE.

GCC does at least optimise something like:

#include <arm_neon.h>

float32x2_t f(float32_t *ptr)
{
    float32x2_t vec = {};
    vec = vld1_lane_f32(ptr, vec, 0);
    vec = vld1_lane_f32(ptr + 2, vec, 1);
    return vec;
}

to:

        ldr     s0, [x0], 8
        ld1     {v0.s}[1], [x0]
        ret

and LLVM behaves similarly, but that seems a bit indirect.

rsandifo-arm avatar Jul 20 '23 17:07 rsandifo-arm

Hi, thanks for your issue report. If possible, we encourage you to contribute with a Pull Request that addresses this issue. We will be happy to review it.

vhscampos avatar Jan 15 '24 10:01 vhscampos