primesieve icon indicating copy to clipboard operation
primesieve copied to clipboard

Add vectorized fillNextPrimes() algorithm for other CPU archtectures (e.g. arm64)

Open kimwalisch opened this issue 2 years ago • 2 comments

primesieve::iterator's performance depends heavily on the fillNextPrimes() method from PrimeGenerator.cpp. For x64 we have a vectorized AVX512 algorithm that is pretty optimal for this task. Once other CPU architectures (e.g. arm64) support 512-bit vector instructions like AVX512 we should port our AVX512 algorithm to these CPU architectures.

ARM has recently added (2021) the Scalable Vector Extension (SVE) to its CPUs. SVE is supposed to be a portable vector instruction set that works with different vector instructions widths. However for vectorizing our fillNextPrimes() method we need at least 512-bit vector instructions.

kimwalisch avatar May 09 '22 09:05 kimwalisch