Xiangyang (Mark) Guo

Results 5 issues of Xiangyang (Mark) Guo

Recently .NET Core enabled hardware intrinsics to generate SIMD instructions from SSE to AVX2. And more instructions are added into the .NET Core API interface. The instruction list can be...

Optimize `Vectorized exp()` with neon simd instructions, copy from the implementation https://github.com/ARM-software/optimized-routines/blob/master/math/aarch64/v_expf.c with minor changes. cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

module: cpu

https://github.com/ARM-software/optimized-routines/tree/master/math/aarch64 implements some math operations with neon simd instructions. The perf looks good, especially for exp(). I'm wondering if it's possible to integrate ARM-software/optimized-routines into sleef? Thanks!

Use sleef for aarch64 by default. cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

ciflow/trunk
module: inductor
ciflow/inductor

Summary: Skip asmjit test on ARM because asmjit doesn't work on ARM. `kernel_32` and `kernel_64` are generated from `GenerateEmbeddingSpMDMNBit`, which calls auto vec version on ARM. Differential Revision: D60181430

fb-exported
cla signed