wide
wide copied to clipboard
NEON instructions on `aarch64`?
I know the README says this:
... and on other architectures this is done by carefully writing functions so that LLVM hopefully does the right thing
But what magic incantation do I have to yell at it to make it actually vectorize my code written using wide
? :/
Is NEON never emitted by rustc
unless directly using the arch intrinsics?
https://rust-lang.github.io/packed_simd/perf-guide/target-feature/rustflags.html#target-feature
"+neon" should do it
Yeah, thanks, I knew about that! And sorry, I should have mentioned that right at the beginning...
The thing is, the neon
target feature is actually enabled by default on aarch64-linux-android
:
$ rustc --target aarch64-linux-android --print cfg
[...]
target_feature="neon"
target_feature="pmuv3"
[...]
So adding that flag won't do anything.
But even if I do add it, I get autovectorization on x86_64
, but not on aarch64
: https://godbolt.org/z/91TrexsvP
Same thing with C++, with either Clang or GCC: https://godbolt.org/z/8qqfjWd7f
I'm not sure much else can be done right now unfortunately.
Other than like, join the portable simd working group and help them get that api to stable faster
Yeah, that's one way, I guess... :/
But really, I shared your hope that LLVM would take care of it on its own, and I find it really weird that it does not, given how popular and important aarch64
is becoming...
aarch64 neon support is now in version 0.7.7
Actually, the main branch's Cargo.toml version doesn't get bumped until the end of development, so it's 0.7.8
But I released wide-0.7.8
just now, so it should be available by the time anyone reads this.