reed-solomon-erasure
reed-solomon-erasure copied to clipboard
Runtime SIMD detection?
Is it possible for this crate to implement runtime SIMD detection, so that portable binaries with SIMD code inside can be published?
I believe it should be possible. The aes crate uses cpufeatures to detect simd support.
I spent a good amount of time working on this. First using rust's nightly feature portable-simd
, then using rust's simd abstractions. It turns out that doing runtime detection is seriously hamstrung by inlining. It's currently not possible to force inlining when you don't even know which function to inline. All this results in code that is much slower than build time SIMD acceleration.
@FallingSnow
It turns out that doing runtime detection is seriously hamstrung by inlining. It's currently not possible to force inlining when you don't even know which function to inline. All this results in code that is much slower than build time SIMD acceleration.
Yes, that's a bit annoying, but you can still do something like:
foo() {
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
{
if is_x86_feature_detected!("avx2") {
return foo_avx2();
}
}
foo_fallback()
}
#[target_feature(enable = "avx2")]
unsafe fn foo_avx2() {
bar();
baz();
}
#[inline(always)]
fn bar() {
unsafe {
let clr_mask = _mm256_set1_epi8(0x0f);
[...]
}
}
#[inline(always)]
fn baz() { ... }
bar()
and baz()
will get inlined and compiled with target_feature(enable = "avx2")
.
That's the approach I've taken with the Reed-Solomon library I just published: https://crates.io/crates/reed-solomon-simd It does runtime selection of SIMD implementation on both AArch64 (Neon) and x86(-64) (SSSE3 and AVX2) with fallback to plain Rust. I don't see any noticeable performance penalty for doing runtime selection.
Feel free to draw some inspiration from that implementation. The relevant code is here: https://github.com/AndersTrier/reed-solomon-simd/tree/master/src/engine
Yeah looking at your code it looks possible. We'd need to change the architecture here to something similar to yours. Where there is an underlying abstraction at the highest level (basically as soon as your create a ReedSolomon
), similar to your engine abstraction. I was trying to swap out the right calls at the call level rather than an entire reed solomon pipeline, which was a mistake.