xsimd
xsimd copied to clipboard
Feature/all inline
The performance bug has been reported to MSVC here.
@amyspark having a look at your benchmark, have you had a look at the assembly before/after to see what's going on?
Random speculation: inlining xsimd functions could be causing AlphaDarkenOp::operator()
to not get inlined into the testCompositionSpeed
loop (if inlining is just based on code size for example), which would be terrible for performance as there's a lot of code in there which would otherwise get pulled out of the loop. The same could well apply to some of the other non-force-inline functions you're using.
@serge-sans-paille upon further review, it seems that, instead of e.g. shifting a register right then using the result, MSVC spills the register on the stack, loads it, shifts, pushes and then pops it for the next operation. This happens for every result that passes through xsimd.
A tad curious: would it be possible to make a test branch that replaces all constant pass-by-references with pass-by-values? IIRC MSVC was quite sensitive to references and doesn't realise they can be inlined without the need for
vmovups
and the like.
Before investigating, can you do test that for your test case and tell us if it changes something? If so I'm ok to do the change
FWIW when switching our project to this branch we saw speedups of 5% - 10% across the board (targeting AVX, compiled with MSVC 19.31 (VS2022)).