primesieve Add vectorized fillNextPrimes() algorithm for other CPU archtectures (e.g. arm64)

Add vectorized fillNextPrimes() algorithm for other CPU archtectures (e.g. arm64)

Open kimwalisch opened this issue 2 years ago • 2 comments

primesieve::iterator's performance depends heavily on the fillNextPrimes() method from PrimeGenerator.cpp. For x64 we have a vectorized AVX512 algorithm that is pretty optimal for this task. Once other CPU architectures (e.g. arm64) support 512-bit vector instructions like AVX512 we should port our AVX512 algorithm to these CPU architectures.

ARM has recently added (2021) the Scalable Vector Extension (SVE) to its CPUs. SVE is supposed to be a portable vector instruction set that works with different vector instructions widths. However for vectorizing our fillNextPrimes() method we need at least 512-bit vector instructions.

May 09 '22 09:05 kimwalisch

primesieve primesieve copied to clipboard

Add vectorized fillNextPrimes() algorithm for other CPU archtectures (e.g. arm64)

primesieve
primesieve copied to clipboard