hashes icon indicating copy to clipboard operation
hashes copied to clipboard

When are asm and intrinsics worth it?

Open enkore opened this issue 3 months ago • 1 comments

I'm looking at doing a third implementation of sha256 for x86 targeting the x86-64-v3 ISA level (AVX, AVX2, but no AVX512 and no SHA-NI, i.e. Haswell), because the pure-rust soft implementation isn't doing so well without SHA-NI. This raised the question when this effort is sensible.

For example, there is a Loongarch64 asm implementation for SHA-256, but it's actually scalar (I believe the LA64 vector instructions aren't even publicly documented) and as a result only about 10% faster than the pure-rust implementation. On the other end of the scale are implementations using dedicated instructions, like SHA-NI or AES-NI, which can be 1000% or more faster. Where's the line? Is there one?

enkore avatar Mar 19 '24 18:03 enkore

There are some tough tradeoffs indeed.

We get pretty frequent complaints about performance when it isn't on par with ASM implementations. See e.g. #327.

Intrinsics add per-platform testing/maintenance burden via redundant implementations of the same algorithm, which also introduces the possibility of per-platform defects. But at least they're Rust code, which makes them accessible to other Rust programmers. ASM has all of the same problems, but has the additional complications of being a separate language from Rust (and obviously lacking its many guarantees around type/memory safety), and having to determine the correct arguments to asm! when using inline assembly.

Regarding the path forward on ASM, which is still an open question since we've removed the old non-inline ASM implementations, personally I've been interesting in finding the safest possible way to consume ASM, particularly looking at projects which provide formally verified ASM implementations for a wide variety of algorithms and platforms where we could extract those implementations in an automated manner and transform them into Rust asm! syntax, or perhaps even have the upstream tooling generate Rust code directly. Some projects of this nature for the specific case of hashes are AWS-LC and HACL*.

This does have the disadvantage that these formally verified implementations tend to lag behind the fastest hand-optimized ASM implementations, and that's also a debatable tradeoff. It might also impact FIPS certification, for those who care about that.

tarcieri avatar Mar 19 '24 18:03 tarcieri