Michael Hartmann

Results 67 comments of Michael Hartmann

Do you think the better readability of the source is worth the performance cost?

> One thing that may cause a big diff is that `convolve` before was using SIMD code, but now relies on the compiler to generate a good loop (which it...

Yes, it does. Scroll down a bit more. Clang and MSVC created unrolled loops for even better performance. GCC did not. All of those examples you've linked are SIMD optimized.

Plus, during my tests I've looked at the assembly, because I was confused that the hand rolled SIMD version wasn't any faster than the C++ version. This is when I...

Simply add ```-ffast-math``` to the clang options ;)

``` .LBB0_6: # =>This Inner Loop Header: Depth=1 movups xmm2, xmmword ptr [rdi + r11] movups xmm3, xmmword ptr [rdi + r11 + 16] movups xmm4, xmmword ptr [rsi +...

Sorry, I always assume that people already use the most optimal settings for release builds. It's what I do without even thinking about it.

Why not simply use ```std::log1p``` there? You can also wrap only that one function in specific ```--no-fast-math``` clang pragmas.

> Or maybe the opposite, forcing -ffast-math for the convolve function so it works event if not defined in the environment. Then, the rest of the code can't benefit from...

If the only benefit is marginally better readability, but the cost is a severe decrease in performance (1.6x times slower if we assume ~24%), then it's not worth it. The...