packed_simd
packed_simd copied to clipboard
parallelize fannkuch_redux with rayon
This blog-post (https://llogiq.github.io/2018/09/06/fast.html) by @llogiq hints to how to parallelize the fannkuch_redux example with rayon. We should add an algorithm that does this, trying to keep the code readable.
That's easy, I basically used most of your API for my U8x16 type, so all it needs is a bit of search&replace.
I don't have much time now, but perhaps I'll push a PR later.
In the blog post you mention:
On the other hand, the latter uses [u32; 16] arrays instead of the smaller U8x16 types used by the SIMD version.
The library exposes a u32x16 type if you need it.