pillow-simd
pillow-simd copied to clipboard
ARM architecture support
You can try https://github.com/nemequ/simde for easy transfer of SSE / AVX instructions to ARM.
Will Arm Neon helps?
Arm Neon technology is an advanced Single Instruction Multiple Data (SIMD)
architecture extension for the Arm Cortex-A and Cortex-R series processors.
https://developer.arm.com/architectures/instruction-sets/simd-isas/neon
I'm not familiar with that library, but if it does what is claims well, might also allow for Power ISA AltiVec/VMX support for pillow-simd for those of us on ppc64le systems.
I managed to use SSE2Neon to get a build working on aarch64. I'm planning to open a pull request soon. Would you be open to include such changes?
@AWSjswinney
This SIMD code is heavily optimized for SSE and AVX instructions. Of course you can translate SSE instructions to NEON and you will get "NEON" version. But will it be even close to speeding up the original SSE version?
For example, one of the most frequently used instruction is _mm_madd_epi16, it makes 8 multiplications and four additions at once. If you take a look at it's implementation in SSE2NEON, you'll see four vget_low_s16, four vget_high_s16, two vmull_s16, two vpadd_s32 and one vcombine_s32 instructions, 13 instructions in total.
I bet what you try to achieve is not some "NEON" version, but optimized NEON version. And I believe this is not the right way.