ARM_NEON_2_x86_SSE icon indicating copy to clipboard operation
ARM_NEON_2_x86_SSE copied to clipboard

vaddvq_xx is not available

Open cool2002 opened this issue 4 years ago • 3 comments

Recently I tried to use NEON_2_SSE.h in my project and have an error : error: 'vaddvq_u16' was not declared in this scope

In original <arm_neon.h>, vaddvq_xxx (Add cross vector functions) are available. ifdef LITTLE_ENDIAN __ai uint8_t vaddvq_u8(uint8x16_t __p0) { uint8_t __ret; __ret = (uint8_t) __builtin_neon_vaddvq_u8((int8x16_t)__p0); return __ret; } #else __ai uint8_t vaddvq_u8(uint8x16_t __p0) { uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); uint8_t __ret; __ret = (uint8_t) __builtin_neon_vaddvq_u8((int8x16_t)__rev0); return __ret; } #endif

https://doc.rust-lang.org/nightly/core/arch/aarch64/fn.vaddvq_u16.html?search=vaddvq_u16

Can you possibly implement this in the NEON_2_SSE.h ?

Cheers.

cool2002 avatar Jul 07 '21 11:07 cool2002

Hi, @cool2002 The "original" arm_neon.h file you refer to is the latest file supporting ARM aarch 64 and I believe the latest ARMs (9?) while the original file NEON_2_SSE.h is for 32-bit ARM 7 generation. That's why it misses some instructions. The main problem preventing me from implementing these new instructions is the absence of the corresponding tests, and they are absolutely necessary, while creating them is a separate time consuming task. So while I don't plan to add something to NEON_2_SSE.h in 2021, if such tests appear, I could change my plans for sure.

Zvictoria avatar Jul 07 '21 15:07 Zvictoria

Hi cool2002,

Le 07/07/2021 à 13:45, cool2002 a écrit :

Recently I tried to use NEON_2_SSE.h in my project and have an error : error: 'vaddvq_u16' was not declared in this scope

In original <arm_neon.h>, vaddvq_xxx (Add cross vector functions) are available. ifdef LITTLE_ENDIAN __ai uint8_t vaddvq_u8(uint8x16_t __p0) { uint8_t __ret; __ret = (uint8_t) __builtin_neon_vaddvq_u8((int8x16_t)__p0); return __ret; } #else __ai uint8_t vaddvq_u8(uint8x16_t __p0) { uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); uint8_t __ret; __ret = (uint8_t) __builtin_neon_vaddvq_u8((int8x16_t)__rev0); return __ret; } #endif

https://doc.rust-lang.org/nightly/core/arch/aarch64/fn.vaddvq_u16.html?search=vaddvq_u16 https://doc.rust-lang.org/nightly/core/arch/aarch64/fn.vaddvq_u16.html?search=vaddvq_u16

Can you possibly implement this in the NEON_2_SSE.h ?

You could try using SIMDe (https://github.com/simd-everywhere) instead. It is actively developed and has support for this intrinsic and many other ARM V8 intrinsics which are missing in NEON_2_SSE.h. If you need any intrinsics which are missing in SIMDe, you can ask for them on https://gitter.im/simd-everywhere/community and I am sure someone will add them.

Cheers, Chris (aka rosbif)

rosbif avatar Jul 07 '21 16:07 rosbif

It seems like this issue still exists. @Zvictoria any plans of adding these "missing" instructions?

Utkarsh-Deshmukh avatar Aug 03 '22 01:08 Utkarsh-Deshmukh