Naoki Shibata
Naoki Shibata
FP16 is interesting, but my understanding is that that does not fit @xoofx 's need. FP16C is not supported by sandybridge or prior processors. There are a few similar but...
So, I implemented fastsinf_u100000 and fastcosf_u100000. https://github.com/shibatch/sleef/tree/Add_lowprec_sin_cos Its error bound is max(10000 ULP, 1e-6). The input domain is [-9, 9]. Finite math only. My plan is also add fastsinf_u35 and...
It should be possible to implement it in a similar way to the CUDA target.
But WebAssembly doesn't have FMA. SLEEF is not so fast if FMA is not available.
I guess the inlinable header for SSE2 is already usable?
I tried it. It compiles, but does not run. ``` [oxygen]~/work/wasm/tmp$ cat hellowasm.c #include #include #include "sleefinline_sse2.h" int main(int argc, char **argv) { double a[] = {2, 10}; double b[]...
It works with the latest node.js. ``` [oxygen]~/work/wasm/tmp$ emcc -O3 -msimd128 -msse2 hellowasm.c [oxygen]~/work/wasm/tmp$ ../node-v15.7.0-linux-x64/bin/node --experimental-wasm-simd a.out.js pow(2, 3) = 8 pow(10, 20) = 1e+20 [oxygen]~/work/wasm/tmp$ ```
@fpetrogalli Do you have any comment on this?
It is difficult to spend any more time on this activity because my institute does not appreciate SLEEF project AT ALL. On 3/13/2022 1:50 AM, Francesco Biscani wrote: > >...
Before version 2.80, SLEEF has been developed intermittently. The way SLEEF is developed is just reverted to that state. As you know, this is a fundamental problem of OSS development....