Dalek NEON v7
Continuation of #457 for v7, @Tarinn will continue this while I patch some things around in Rust and LLVM...
I pushed some changes to the dalek-neon branch. @Tarinn will try to get both branches in sync. In that branch, there are some updates that (hopefully) doesn't make this slower anymore on M2 (and other recent aarch64 chips). We don't have Apple Silicon here to get it tested though, but we'll ping you once we think a retest would be useful.
Sounds great, thanks!
Latest commit should address the comments above. On our local a55 test device there is at least no slowdown anymore (although not much speed up either). Please do say if the slowdown on M2 devices is still significant. As mentioned in #457, this is a difficult problem seemingly originating in LLVM, but I can dive into the assembly code a bit more to try and find a solution.