xsimd icon indicating copy to clipboard operation
xsimd copied to clipboard

Feature/all inline

Open serge-sans-paille opened this issue 3 years ago • 5 comments

serge-sans-paille avatar Nov 22 '21 06:11 serge-sans-paille

The performance bug has been reported to MSVC here.

amyspark avatar Nov 22 '21 14:11 amyspark

@amyspark having a look at your benchmark, have you had a look at the assembly before/after to see what's going on?

Random speculation: inlining xsimd functions could be causing AlphaDarkenOp::operator() to not get inlined into the testCompositionSpeed loop (if inlining is just based on code size for example), which would be terrible for performance as there's a lot of code in there which would otherwise get pulled out of the loop. The same could well apply to some of the other non-force-inline functions you're using.

tomjnixon avatar Nov 22 '21 14:11 tomjnixon

@serge-sans-paille upon further review, it seems that, instead of e.g. shifting a register right then using the result, MSVC spills the register on the stack, loads it, shifts, pushes and then pops it for the next operation. This happens for every result that passes through xsimd.

amyspark avatar Nov 22 '21 16:11 amyspark

A tad curious: would it be possible to make a test branch that replaces all constant pass-by-references with pass-by-values? IIRC MSVC was quite sensitive to references and doesn't realise they can be inlined without the need for vmovups and the like.

Before investigating, can you do test that for your test case and tell us if it changes something? If so I'm ok to do the change

serge-sans-paille avatar Nov 26 '21 10:11 serge-sans-paille

FWIW when switching our project to this branch we saw speedups of 5% - 10% across the board (targeting AVX, compiled with MSVC 19.31 (VS2022)).

marzer avatar Dec 18 '21 11:12 marzer