Xiangyang (Mark) Guo issues

Results 5 issues of


                                            Xiangyang (Mark) Guo

Use .NET Core hardware intrinsics to improve the performance?

Recently .NET Core enabled hardware intrinsics to generate SIMD instructions from SSE to AVX2. And more instructions are added into the .NET Core API interface. The instruction list can be...

Optimize Vectorized<float> exp() with neon simd instructions

Optimize `Vectorized exp()` with neon simd instructions, copy from the implementation https://github.com/ARM-software/optimized-routines/blob/master/math/aarch64/v_expf.c with minor changes. cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

module: cpu

Integrate ARM-software/optimized-routines into sleef?

https://github.com/ARM-software/optimized-routines/tree/master/math/aarch64 implements some math operations with neon simd instructions. The perf looks good, especially for exp(). I'm wondering if it's possible to integrate ARM-software/optimized-routines into sleef? Thanks!

Use sleef on macOS Apple silicon by default

Use sleef for aarch64 by default. cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

ciflow/trunk

module: inductor

ciflow/inductor

skip asmjit test on arm

Summary: Skip asmjit test on ARM because asmjit doesn't work on ARM. `kernel_32` and `kernel_64` are generated from `GenerateEmbeddingSpMDMNBit`, which calls auto vec version on ARM. Differential Revision: D60181430

fb-exported

cla signed