Michael Hartmann comments

Results 67 comments of


                                            Michael Hartmann

Residfp experimental

Do you think the better readability of the source is worth the performance cost?

Residfp experimental

> One thing that may cause a big diff is that `convolve` before was using SIMD code, but now relies on the compiler to generate a good loop (which it...

Residfp experimental

Yes, it does. Scroll down a bit more. Clang and MSVC created unrolled loops for even better performance. GCC did not. All of those examples you've linked are SIMD optimized.

Residfp experimental

Plus, during my tests I've looked at the assembly, because I was confused that the hand rolled SIMD version wasn't any faster than the C++ version. This is when I...

Residfp experimental

Simply add ```-ffast-math``` to the clang options ;)

Residfp experimental

``` .LBB0_6: # =>This Inner Loop Header: Depth=1 movups xmm2, xmmword ptr [rdi + r11] movups xmm3, xmmword ptr [rdi + r11 + 16] movups xmm4, xmmword ptr [rsi +...

Residfp experimental

Sorry, I always assume that people already use the most optimal settings for release builds. It's what I do without even thinking about it.

Residfp experimental

Why not simply use ```std::log1p``` there? You can also wrap only that one function in specific ```--no-fast-math``` clang pragmas.

Residfp experimental

> Or maybe the opposite, forcing -ffast-math for the convolve function so it works event if not defined in the environment. Then, the rest of the code can't benefit from...

Residfp experimental

If the only benefit is marginally better readability, but the cost is a severe decrease in performance (1.6x times slower if we assume ~24%), then it's not worth it. The...