Naoki Shibata
Naoki Shibata
> the user should be able to specify at run time which one to select. Who is a user in this context? There are developers of many linked libraries, and...
We also need to encode input domains of each argument. An input domain has to be specified for each argument. As @xoofx mentioned, the precision specifier could be u5242875, which...
The scalar functions are not seriously optimized. They are provided for easier understanding of the algorithms. As you know, I am planning to make the scalar functions use AVX2 or...
The main reason is that availability of FMA affects the performance significantly. By seeing the assembly output, I think the difference is understandable. Compare the number of add and sub.
Did you see the benchmark result? The difference is less than x1.5 with AVX2. http://sleef.org/benchmark.xhtml
Since the benchmarking tool is executed on Core i7-6700, it automatically selects AVX2 implementation. The graph shows the execution time for Sleef_sind4_u10 is 0.015us (blue bar). The execution time for...
It is also hard to say if u20 version will be faster than u10. I also need to check what accuracy the u35 version is computing at. I am basically...
Okay, I will consider but this would take time. For usage in unity, why do you need 1 ULP accuracy? I think 3.5 ULP is enough for gaming?
> The usage could vary from ULP, where we would like even to have exact deterministic precision across platforms , specially for things like physics...etc. that can deviate quickly, and...
The main difference between u10 and u35 is whether DD operators are used. DD operators are required to achieve 1 ULP accuracy, and u35 functions are made by removing DD...