Intel/ARM native SHA256
New CPUs from Intel and ARMv8 cores support native SHA256 hashing (in microcode). This would significantly increase the SHA256d hashrate if it was implemented.
interesting!
I think is is Linux code to do the arm neon sha256: https://patchwork.kernel.org/project/linux-arm-kernel/patch/[email protected]/
Currently I can't link: clang -fno-strict-aliasing -Ofast -arch arm64 -mfpu=neon -pthread -o minerd minerd-cpu-miner.o minerd-util.o minerd-sha2.o minerd-scrypt.o /opt/homebrew/Cellar/curl/7.76.1/lib/libcurl.dylib compat/jansson/libjansson.a -lpthread Undefined symbols for architecture arm64: "_sha256_init_4way", referenced from: _scanhash_scrypt in minerd-scrypt.o "_sha256_transform_4way", referenced from: _scanhash_scrypt in minerd-scrypt.o "_sha256_use_4way", referenced from: _scanhash_sha256d in minerd-sha2.o _scanhash_scrypt in minerd-scrypt.o "_sha256d_ms_4way", referenced from: _scanhash_sha256d in minerd-sha2.o ld: symbol(s) not found for architecture arm64 clang: error: linker command failed with exit code 1 (use -v to see invocation) make[2]: *** [minerd] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2
I'm not sure why, because the files are all arm64: risner@M1 cpuminer-master % file *.o minerd-cpu-miner.o: Mach-O 64-bit object arm64 minerd-scrypt.o: Mach-O 64-bit object arm64 minerd-sha2.o: Mach-O 64-bit object arm64 minerd-util.o: Mach-O 64-bit object arm64
@risner, I believe both your posts are off-topic here. This thread is about SHA-2 specific instructions, not about NEON or Aarch64 in general. As for your second post, feel free to open a new ticket; if you do, make sure you provide all the steps you followed and include or link to your config.log.
Apologies. I may be confused, but the SHA-2 specific instructions are available only from A13+ and M1 arm chips only. From what I can tell, no other arm maker is using these built in instructions. And they seem to be in "neon" calls?
I'll make another post for the off topic issue in compiling on M1.
@risner, I agree it can be confusing. My point is that NEON can be used to improve SHA performance even without the newer SHA instructions (as cpuminer already does on 32-bit ARM, and could in theory do on Aarch64). From a cursory look, I believe that this is what the link you posted is about.
Thanks. I also found this: https://www.google.com/amp/s/blog.min.io/accelerating-sha256-by-100x-in-golang-on-arm/amp/
It shows a 100 times improvement (6 MB/s to 615MB/s) going from unaccelerated code to 64 bit arm SHA commands (sha256h, sha256h2, sha256su0 and sha256su1).
Here is the 64 bit code they implemented: https://github.com/minio/sha256-simd/blob/6de4475307716de15b286880ff321c9547086fdd/sha256block_arm64.s
My experience with SHA on Ryzen and Icelake is that 8-way parallel hashing is faster. I haven't tested vs 4-way because AVX2 is available on most CPUs with SHA, but extrapolating the 8-way test results suggests 4-way is significantly slower than SHA.
I also found that SHA prevents some of the more innovative SW optimizations specific to sha256d.
ARM SHA could indeed be faster than NEON 4-way.
So there is nothing to gain from that optimization?