Naoki Shibata comments

Results 195 comments of


                                            Naoki Shibata

Naming of functions for llvm runtime

> Technically, it is very easy to do this. One only has to add a function declaration, and call it, and it won't even fail to link since the function...

Naming of functions for llvm runtime

>From there, some functions have _avx suffixes that denote that they require AVX, but others don't have anything, yet they do require AVX too (e.g. the __m256... dispatchers). I found...

Naming of functions for llvm runtime

>use __attribute__((target("...")) on all the functions in the library properly I think that attribute is not supported by MSVC. > But this is something that might remove complexity instead of...

Scalar functions using vector extensions

That is okay. Another plan I am thinking is like follows. typedef __m128d vdouble; vdouble vadd_vd_vd_vd(vdouble vx, vdouble vy) { return _mm_add_sd(vx, vy); }

Scalar functions using vector extensions

I don't understand why using vector register to compute scalar values degrades performance. See the assembly output from the compiler. It is basically using vector register for scalar computation. One...

Scalar functions using vector extensions

For vadd_vi_vi_vi, we can just use the current implementation since integer operation is fast enough. I guess transfer between a vector register and a normal register would take more time....

Scalar functions using vector extensions

vmla is multiplication + addition, but it is used if contraction to fma is permitted. vfma is FMA, and only used if FMA is available. FMA is extensively used in...

Scalar functions using vector extensions

Regarding this, I am planning to change the names of macros for enabling helper files. My plan is to make the name of macro as follows: ENABLE_(extension name)_(vector width in...

Scalar functions using vector extensions

In addition to that, the names for AVX2128 functions will be all changed to the names ending with "avx2". For example, Sleef_sind2_u10avx2128 will become Sleef_sind2_u10avx2. The vector width for a...

Scalar functions using vector extensions

@d-parks Thank you. I now think that this is going to be a very important feature of SLEEF. I am also vaguely thinking adding GPGPU support. Please let me know...