simdeez
simdeez copied to clipboard
Compile only scalar fallbacks on unsupported targets
It is a bit hard to rely on this crate (and by extension the noise crate) in cross-platform libraries, because while scalar fallbacks exist, it still tries to compile the simd implementations for non-x86/amd64 targets. An example error is:
/xxx/.cargo/registry/src/github.com-1ecc6299db9ec823/simdeez-1.0.0/src/sse41.rs:644:26
|
644 | I32x4_41(_mm_srli_epi32(a.0, $amt_const))
| ^^^^^^^^^^^^^^ not found in this scope
...
647 | constify_imm8!(amt_const, call)
| ------------------------------- in this macro invocation
|
can you try the current master branch and see if this is still an issue?
I tried d66564f0f3 for a few popular targets and some worked, some didn't.
Try rustup toolchain install nightly -t $T && cargo build --target $T
with T=i686-unknown-linux-gnu
or T=wasm32-unknown-unknown
and you'll get different build errors.
Maybe it would be good to implement/document a way to get scalar only builds by specifying some feature, or using default-features = false
, so people can continue using the library even if some platform is broken or not yet supported.
Ok I fixed up the i686 case, and it should fix most cases now since I'm doing the conditional includes based on target_feature now. wasm is broken because it just isn't done yet.
Hmm, actually what I did screws up other things. I think your are right it may be necessary to manually turn features on and off.
Does it? I was just about to close this. In any case, I guess adding
default = ["scalar","sse2", "sse41", "avx2"]
scalar = []
sse2 = []
sse41 = []
avx2 = []
to the features section and gating the implementations with something like #[cfg(all(feature="sse2", target_feature="sse2"))]
can't hurt.
Looks like it allows reducing the compiled size quite a bit as well. Not sure if that is interesting.
Thanks for the great work btw. Looking forward to the next release of simdeez and simdnoise (especially if it includes #20).
Yeah the problems comes up with runtime selection. Say you want a build that will run on any x86 machine, but will use AVX2 if available. Then then AVX2 feature won't exist, but you need to build that AVX2 code. If you build for an arch that has AVX2, then it won't run on any x86 machine any more, because llvm will use some avx2 instructions all over, not just in the functions we tag.
I see, yes makes sense. So the "correct" solution is a slightly more complex architecture whitelist per extension? What was the underlying reason for the i686 build issue?
Right, I think the problem is that the architecture flags (x86, x86_64 etc) aren't enough to tell you if SSE2 vs SSE4.1 etc are available. So probably need to make our own feature options.
@jackmott was this issue ever resolved? My goal is to use simdeez to compile a binary to be used in a public docker image and it's not at all clear how to do this so its performant and portable. What target-features should be enabled?
I wonder if the recommended approach is to use https://github.com/ronnychevalier/cargo-multivers or https://crates.io/crates/multiversion along with simdeez to support this usecase.