pillow-simd icon indicating copy to clipboard operation
pillow-simd copied to clipboard

ARM architecture support

Open ghost opened this issue 7 years ago • 4 comments

You can try https://github.com/nemequ/simde for easy transfer of SSE / AVX instructions to ARM.

ghost avatar Oct 18 '18 17:10 ghost

Will Arm Neon helps?

Arm Neon technology is an advanced Single Instruction Multiple Data (SIMD)
architecture extension for the Arm Cortex-A and Cortex-R series processors.

https://developer.arm.com/architectures/instruction-sets/simd-isas/neon

bact avatar Feb 18 '20 20:02 bact

I'm not familiar with that library, but if it does what is claims well, might also allow for Power ISA AltiVec/VMX support for pillow-simd for those of us on ppc64le systems.

qhaas avatar Aug 05 '20 14:08 qhaas

I managed to use SSE2Neon to get a build working on aarch64. I'm planning to open a pull request soon. Would you be open to include such changes?

AWSjswinney avatar Jun 04 '21 19:06 AWSjswinney

@AWSjswinney

This SIMD code is heavily optimized for SSE and AVX instructions. Of course you can translate SSE instructions to NEON and you will get "NEON" version. But will it be even close to speeding up the original SSE version?

For example, one of the most frequently used instruction is _mm_madd_epi16, it makes 8 multiplications and four additions at once. If you take a look at it's implementation in SSE2NEON, you'll see four vget_low_s16, four vget_high_s16, two vmull_s16, two vpadd_s32 and one vcombine_s32 instructions, 13 instructions in total.

I bet what you try to achieve is not some "NEON" version, but optimized NEON version. And I believe this is not the right way.

homm avatar Jun 05 '21 14:06 homm