Naoki Shibata
Naoki Shibata
> Technically, it is very easy to do this. One only has to add a function declaration, and call it, and it won't even fail to link since the function...
>From there, some functions have _avx suffixes that denote that they require AVX, but others don't have anything, yet they do require AVX too (e.g. the __m256... dispatchers). I found...
>use __attribute__((target("...")) on all the functions in the library properly I think that attribute is not supported by MSVC. > But this is something that might remove complexity instead of...
That is okay. Another plan I am thinking is like follows. typedef __m128d vdouble; vdouble vadd_vd_vd_vd(vdouble vx, vdouble vy) { return _mm_add_sd(vx, vy); }
I don't understand why using vector register to compute scalar values degrades performance. See the assembly output from the compiler. It is basically using vector register for scalar computation. One...
For vadd_vi_vi_vi, we can just use the current implementation since integer operation is fast enough. I guess transfer between a vector register and a normal register would take more time....
vmla is multiplication + addition, but it is used if contraction to fma is permitted. vfma is FMA, and only used if FMA is available. FMA is extensively used in...
Regarding this, I am planning to change the names of macros for enabling helper files. My plan is to make the name of macro as follows: ENABLE_(extension name)_(vector width in...
In addition to that, the names for AVX2128 functions will be all changed to the names ending with "avx2". For example, Sleef_sind2_u10avx2128 will become Sleef_sind2_u10avx2. The vector width for a...
@d-parks Thank you. I now think that this is going to be a very important feature of SLEEF. I am also vaguely thinking adding GPGPU support. Please let me know...