cpuminer Intel/ARM native SHA256

New CPUs from Intel and ARMv8 cores support native SHA256 hashing (in microcode). This would significantly increase the SHA256d hashrate if it was implemented.

Jan 05 '18 06:01 ghost

interesting!

May 17 '19 08:05 yu-chenxi

I think is is Linux code to do the arm neon sha256: https://patchwork.kernel.org/project/linux-arm-kernel/patch/[email protected]/

Apr 29 '21 03:04 risner

Currently I can't link: clang -fno-strict-aliasing -Ofast -arch arm64 -mfpu=neon -pthread -o minerd minerd-cpu-miner.o minerd-util.o minerd-sha2.o minerd-scrypt.o /opt/homebrew/Cellar/curl/7.76.1/lib/libcurl.dylib compat/jansson/libjansson.a -lpthread Undefined symbols for architecture arm64: "_sha256_init_4way", referenced from: _scanhash_scrypt in minerd-scrypt.o "_sha256_transform_4way", referenced from: _scanhash_scrypt in minerd-scrypt.o "_sha256_use_4way", referenced from: _scanhash_sha256d in minerd-sha2.o _scanhash_scrypt in minerd-scrypt.o "_sha256d_ms_4way", referenced from: _scanhash_sha256d in minerd-sha2.o ld: symbol(s) not found for architecture arm64 clang: error: linker command failed with exit code 1 (use -v to see invocation) make[2]: *** [minerd] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2

I'm not sure why, because the files are all arm64: risner@M1 cpuminer-master % file *.o minerd-cpu-miner.o: Mach-O 64-bit object arm64 minerd-scrypt.o: Mach-O 64-bit object arm64 minerd-sha2.o: Mach-O 64-bit object arm64 minerd-util.o: Mach-O 64-bit object arm64

Apr 29 '21 13:04 risner

@risner, I believe both your posts are off-topic here. This thread is about SHA-2 specific instructions, not about NEON or Aarch64 in general. As for your second post, feel free to open a new ticket; if you do, make sure you provide all the steps you followed and include or link to your config.log.

Apr 29 '21 14:04 pooler

Apologies. I may be confused, but the SHA-2 specific instructions are available only from A13+ and M1 arm chips only. From what I can tell, no other arm maker is using these built in instructions. And they seem to be in "neon" calls?

I'll make another post for the off topic issue in compiling on M1.

Apr 29 '21 15:04 risner

@risner, I agree it can be confusing. My point is that NEON can be used to improve SHA performance even without the newer SHA instructions (as cpuminer already does on 32-bit ARM, and could in theory do on Aarch64). From a cursory look, I believe that this is what the link you posted is about.

Apr 29 '21 16:04 pooler

Thanks. I also found this: https://www.google.com/amp/s/blog.min.io/accelerating-sha256-by-100x-in-golang-on-arm/amp/

It shows a 100 times improvement (6 MB/s to 615MB/s) going from unaccelerated code to 64 bit arm SHA commands (sha256h, sha256h2, sha256su0 and sha256su1).

Here is the 64 bit code they implemented: https://github.com/minio/sha256-simd/blob/6de4475307716de15b286880ff321c9547086fdd/sha256block_arm64.s

Apr 30 '21 00:04 risner

My experience with SHA on Ryzen and Icelake is that 8-way parallel hashing is faster. I haven't tested vs 4-way because AVX2 is available on most CPUs with SHA, but extrapolating the 8-way test results suggests 4-way is significantly slower than SHA.

I also found that SHA prevents some of the more innovative SW optimizations specific to sha256d.

ARM SHA could indeed be faster than NEON 4-way.

Oct 19 '21 15:10 JayDDee

So there is nothing to gain from that optimization?

Nov 02 '21 16:11 risner