zig
zig copied to clipboard
crypto.sha2: Use intrinsics for SHA-256 on x86-64 and AArch64
There's probably plenty of room to optimize these further in the future, but for the moment this gives ~4-5x improvement on x86-64, and ~10x on M1 AArch64 Macs.
These extensions are very new. Most Intel processors prior to 2020 do not support them. AMD has supported them since Ryzen.
P.S. @kubkon I haven't fixed the CPU features for LLVM in stage2 yet, but at least this gives you something to play with until that's sorted 🙂
In future it would be nice if we use inline assembly instead of LLVM intrinsics to enable the self-hosted backends to use these features too.
In future it would be nice if we use inline assembly instead of LLVM intrinsics to enable the self-hosted backends to use these features too.
Good point! Switched to inline asm for this PR.
In general, I think inline assembly can interact very differently with the optimizer versus intrinsics, but in this case it's actually a 20% win for Intel x86-64: Intel is over 1 GB/s now (4x improvement vs. master)
Eugh... Intel x86-64 loves the new inline assembly, but AMD hates it:
intrinsics: 66870 us (1568.08 MB/s)
inline asm: 251760 us (416.50 MB/s)
I'll go digging through the assembly and see if I can figure out why.
Eugh... Intel x86-64 loves the new inline assembly, but AMD hates it:
intrinsics: 66870 us (1568.08 MB/s) inline asm: 251760 us (416.50 MB/s)
I'll go digging through the assembly and see if I can figure out why.
What about arm64? Comparable, or ahead, or behind? If the latter, that's bad timing as I wanted to start testing it out in zld repo.
What about arm64? Comparable, or ahead, or behind? If the latter, that's bad timing as I wanted to start testing it out in zld repo.
arm64 on my M1 mac is about the same (<5% delta)
What about arm64? Comparable, or ahead, or behind? If the latter, that's bad timing as I wanted to start testing it out in zld repo.
arm64 on my M1 mac is about the same (<5% delta)
Awesome! Thanks for confirming!
Is there any trick to building any of this locally? On my M1Pro I am getting:
error: <inline asm>:2:1: instruction requires: sha2
sha256h.4s v2, v1, v3
not sure if helpful but I have a script that benches SHA hashing comparing BoringSSL's implementation to Zig's implementation
This is for an old build of Zig without this PR's changes 0.10.0-dev.2822+b79884eaf
on a Ryzen machine
[SHA256]
zig: 3.292s
boring: 518.223ms
evp: 518.25ms
evp in: 518.228ms
https://github.com/Jarred-Sumner/bun/blob/dea7cb14bdf0446fcb8a0750fe86b2056a7c3be0/src/sha.zig#L173-L230
@kubkon what does zig build-exe --show-builtin
output for you?
@kubkon what does
zig build-exe --show-builtin
output for you?
pub const cpu: std.Target.Cpu = .{
.arch = .aarch64,
.model = &std.Target.aarch64.cpu.apple_a14,
.features = std.Target.aarch64.featureSet(&[_]std.Target.aarch64.Feature{
.aes,
.aggressive_fma,
.alternate_sextload_cvt_f32_pattern,
.altnzcv,
.am,
.arith_bcc_fusion,
.arith_cbz_fusion,
.ccdp,
.ccidx,
.ccpp,
.complxnum,
.contextidr_el2,
.crc,
.crypto,
.disable_latency_sched_heuristic,
.dit,
.dotprod,
.el2vmsa,
.el3,
.flagm,
.fp16fml,
.fp_armv8,
.fptoint,
.fullfp16,
.fuse_address,
.fuse_adrp_add,
.fuse_aes,
.fuse_arith_logic,
.fuse_crypto_eor,
.fuse_csel,
.fuse_literals,
.jsconv,
.lor,
.lse,
.lse2,
.mpam,
.neon,
.nv,
.pan,
.pan_rwv,
.pauth,
.perfmon,
.predres,
.ras,
.rcpc,
.rcpc_immo,
.rdm,
.sb,
.sel2,
.sha2,
.sha3,
.specrestrict,
.ssbs,
.tlb_rmi,
.tracev8_4,
.uaops,
.v8_1a,
.v8_2a,
.v8_3a,
.v8_4a,
.v8a,
.vh,
.zcm,
.zcz,
.zcz_gp,
}),
};
Doesn't this make it impossible to do sha256 at comptime?
Doesn't this make it impossible to do sha256 at comptime?
Good point, I hadn't thought of that. Yeah, it breaks comptime evaluation for SHA-256.
Without https://github.com/ziglang/zig/issues/868 (or a related hack), the only way to fix that is to create a separate comptime method that users call at comptime. That means it's viral, too: Any function that calls SHA-256 must split into a comptime and runtime version, as well as their callers and so on.
For hash functions, comptime-support can indeed be an important thing to have.
That leaves us with either dynamic dispatch (which doesn't match the rest, and the plan to eventually do it), or, as you pointed out, a way to check if the function is called at comptime or not.
Doing this, even with a hack as a starter, sounds like a much better route that duplicating every function to have a comptine variant.
pub fn isComptime() bool {
var t = true;
var a: u8 = 0;
var b: u16 = 0;
const x = if (t) a else b;
return @TypeOf(x) == u8;
}
That's a nice trick!
CMake Error: The following variables are used in this project, but they are set to NOTFOUND. Please set them or make sure they are set and tested correctly in the CMake files: ZSTD
Hmm not sure what's up with the CI failures - I don't understand why this would be happening in this branch but not master branch.
Are you sure this is rebased against master branch? In master branch I see this:
-- The C compiler identification is Clang 15.0.3
-- The CXX compiler identification is Clang 15.0.3
However in this CI run I see this:
-- The C compiler identification is Clang 15.0.0
-- The CXX compiler identification is Clang 15.0.0
Possibly indicating an old tarball.
Are you sure this is rebased against master branch?
Oops, I thought CI merged the PR branch into master automatically, but I see now that this doesn't apply changes to the CI script itself.
Just pushed a proper rebase - should be ready for review/merge assuming it passes.
Re: the CI failures, I think
!isComptime() and comptime std.Target.x86.featureSetHas(builtin.cpu.features, .sha)
has to be changed to
comptime std.Target.x86.featureSetHas(builtin.cpu.features, .sha) and !isComptime()
See #6768 for more details.
Once we get rid of stage1, we can make featureSetHas
inline
and then that comptime
keyword can go away.
Thanks for the review! Yep, #6768 took me by surprise.
Should be all fixed up now :+1:
Doesn't this make it impossible to do sha256 at comptime?
Good point, I hadn't thought of that. Yeah, it breaks comptime evaluation for SHA-256.
Without #868 (or a related hack), the only way to fix that is to create a separate comptime method that users call at comptime. That means it's viral, too: Any function that calls SHA-256 must split into a comptime and runtime version, as well as their callers and so on.
I don't think it's necessary to add another comptime method. I made some changes like below and run
zig test lib/std/std.zig --zig-lib-dir lib
The tests built and passed. The hash functions should already support running at comptile time, no need to add another one, am I right?