simdeez icon indicating copy to clipboard operation
simdeez copied to clipboard

Compile only scalar fallbacks on unsupported targets

Open FlorianUekermann opened this issue 4 years ago • 9 comments

It is a bit hard to rely on this crate (and by extension the noise crate) in cross-platform libraries, because while scalar fallbacks exist, it still tries to compile the simd implementations for non-x86/amd64 targets. An example error is:

/xxx/.cargo/registry/src/github.com-1ecc6299db9ec823/simdeez-1.0.0/src/sse41.rs:644:26
    |
644 |                 I32x4_41(_mm_srli_epi32(a.0, $amt_const))
    |                          ^^^^^^^^^^^^^^ not found in this scope
...
647 |         constify_imm8!(amt_const, call)
    |         ------------------------------- in this macro invocation
    |

FlorianUekermann avatar Apr 26 '20 16:04 FlorianUekermann

can you try the current master branch and see if this is still an issue?

jackmott avatar May 03 '20 18:05 jackmott

I tried d66564f0f3 for a few popular targets and some worked, some didn't. Try rustup toolchain install nightly -t $T && cargo build --target $T with T=i686-unknown-linux-gnu or T=wasm32-unknown-unknown and you'll get different build errors. Maybe it would be good to implement/document a way to get scalar only builds by specifying some feature, or using default-features = false, so people can continue using the library even if some platform is broken or not yet supported.

FlorianUekermann avatar May 09 '20 15:05 FlorianUekermann

Ok I fixed up the i686 case, and it should fix most cases now since I'm doing the conditional includes based on target_feature now. wasm is broken because it just isn't done yet.

jackmott avatar May 09 '20 16:05 jackmott

Hmm, actually what I did screws up other things. I think your are right it may be necessary to manually turn features on and off.

jackmott avatar May 09 '20 18:05 jackmott

Does it? I was just about to close this. In any case, I guess adding

default = ["scalar","sse2", "sse41", "avx2"]
scalar = []
sse2 = []
sse41 = []
avx2 = []

to the features section and gating the implementations with something like #[cfg(all(feature="sse2", target_feature="sse2"))] can't hurt.

Looks like it allows reducing the compiled size quite a bit as well. Not sure if that is interesting.

Thanks for the great work btw. Looking forward to the next release of simdeez and simdnoise (especially if it includes #20).

FlorianUekermann avatar May 09 '20 18:05 FlorianUekermann

Yeah the problems comes up with runtime selection. Say you want a build that will run on any x86 machine, but will use AVX2 if available. Then then AVX2 feature won't exist, but you need to build that AVX2 code. If you build for an arch that has AVX2, then it won't run on any x86 machine any more, because llvm will use some avx2 instructions all over, not just in the functions we tag.

jackmott avatar May 09 '20 18:05 jackmott

I see, yes makes sense. So the "correct" solution is a slightly more complex architecture whitelist per extension? What was the underlying reason for the i686 build issue?

FlorianUekermann avatar May 09 '20 19:05 FlorianUekermann

Right, I think the problem is that the architecture flags (x86, x86_64 etc) aren't enough to tell you if SSE2 vs SSE4.1 etc are available. So probably need to make our own feature options.

jackmott avatar May 09 '20 20:05 jackmott

@jackmott was this issue ever resolved? My goal is to use simdeez to compile a binary to be used in a public docker image and it's not at all clear how to do this so its performant and portable. What target-features should be enabled?

I wonder if the recommended approach is to use https://github.com/ronnychevalier/cargo-multivers or https://crates.io/crates/multiversion along with simdeez to support this usecase.

cevian avatar Aug 07 '24 01:08 cevian