Naoki Shibata

Results 195 comments of Naoki Shibata

> Technically, it is very easy to do this. One only has to add a function declaration, and call it, and it won't even fail to link since the function...

>From there, some functions have _avx suffixes that denote that they require AVX, but others don't have anything, yet they do require AVX too (e.g. the __m256... dispatchers). I found...

>use __attribute__((target("...")) on all the functions in the library properly I think that attribute is not supported by MSVC. > But this is something that might remove complexity instead of...

That is okay. Another plan I am thinking is like follows. typedef __m128d vdouble; vdouble vadd_vd_vd_vd(vdouble vx, vdouble vy) { return _mm_add_sd(vx, vy); }

I don't understand why using vector register to compute scalar values degrades performance. See the assembly output from the compiler. It is basically using vector register for scalar computation. One...

For vadd_vi_vi_vi, we can just use the current implementation since integer operation is fast enough. I guess transfer between a vector register and a normal register would take more time....

vmla is multiplication + addition, but it is used if contraction to fma is permitted. vfma is FMA, and only used if FMA is available. FMA is extensively used in...

Regarding this, I am planning to change the names of macros for enabling helper files. My plan is to make the name of macro as follows: ENABLE_(extension name)_(vector width in...

In addition to that, the names for AVX2128 functions will be all changed to the names ending with "avx2". For example, Sleef_sind2_u10avx2128 will become Sleef_sind2_u10avx2. The vector width for a...

@d-parks Thank you. I now think that this is going to be a very important feature of SLEEF. I am also vaguely thinking adding GPGPU support. Please let me know...