Amyspark

Results 107 comments of Amyspark

I fixed this with #710, @serge-sans-paille with #679.

> This means that in general it is not safe to run a program compiled with -mavx2 (for example) on a CPU that doesn't support AVX2, even if any code...

> https://github.com/xtensor-stack/xsimd/pull/675 proposes an approach to fix that issue, I'd happily take feedbacks. I'd say to templatize the architecture parameter or make it part of the function signature, the current...

This is what I did to hand-optimize two cases we use at Krita: https://github.com/xtensor-stack/xsimd/blob/c7567bbedebcfbf3ba95304ff1a6722b32a0d63f/include/xsimd/arch/xsimd_avx2.hpp#L350-L369 Instead of using separate batch types, I would suggest to SFINAE on the size of the...

The performance bug has been reported to MSVC [here](https://developercommunity.visualstudio.com/t/2x-performance-loss-when-using-__forcein/1592199).

@serge-sans-paille upon further review, it seems that, instead of e.g. shifting a register right then using the result, MSVC spills the register on the stack, loads it, shifts, pushes and...

I'll test this branch tonight with my benchmark.

@serge-sans-paille, #645 has no effect on my benchmark; MSVC still doesn't inline xsimd's methods, resulting in a 50% perf hit compared to `__forceinline`. (Using `/Ob3 /O2 /Gv /Oi`)

> @amyspark #645 updated with always inline, can you check if that fixes your issue? That does the trick! But the `friend` functions in `xsimd::batch`, e.g. https://github.com/xtensor-stack/xsimd/blob/54aa8e72bc7cda47907879f5ad2a9c11b4c127e7/include/xsimd/types/xsimd_batch.hpp#L163-L171 still need to...

It's a tongue-twister: xsimd's `swizzle` is the equivalent of Intel's `shuffle`s. Here I need *whole lane* (128 for AVX, 256 for AVX512) `swizzle`s, which in Intel's lingo is `permute`s.