stdarch
stdarch copied to clipboard
Modify Zks functions to use LLVM intrinsics
This pull request replaces use of inline assembly into LLVM intrinsics for RISC-V Zks functions. This change would enable optimizations by using LLVM intrinsics. This pull request also adds a target feature attribute on each Zks function.
The Zks functions are now gated under RISC-V Rust target feature:
Use of as
cast: RISC-V Zks instructions does not use upper 32 bits of parameter and return value (we sign extened it as parameter, and truncate from return value). Ref: https://github.com/riscv/riscv-crypto/blob/master/doc/scalar/insns/sm3p0.adoc#Operation
Compile output:
; core_arch::core_arch::riscv_shared::sm3p0
; Function Attrs: inlinehint mustprogress nofree nosync nounwind readnone willreturn
define internal fastcc i32 @_ZN9core_arch9core_arch12riscv_shared5sm3p017h50602641902a2471E(i32 %x) unnamed_addr #0 {
start:
%_3 = zext i32 %x to i64
%_2 = tail call i64 @llvm.riscv.sm3p0.i64(i64 %_3) #5
%0 = trunc i64 %_2 to i32
ret i32 %0
}
000000008000004e <_ZN9core_arch9core_arch12riscv_shared5sm3p017h50602641902a2471E>:
8000004e: 1502 slli a0,a0,0x20
80000050: 9101 srli a0,a0,0x20
80000052: 10851513 sm3p0 a0,a0
80000056: 8082 ret
0000000080000058 <_ZN9core_arch9core_arch12riscv_shared5sm3p117h9f0bc8b014c2d03aE>:
80000058: 1502 slli a0,a0,0x20
8000005a: 9101 srli a0,a0,0x20
8000005c: 10951513 sm3p1 a0,a0
80000060: 8082 ret
Current LLVM does not have llvm.riscv.sm3p0.i32
on 64-bit machine, it would result in LLVM ERROR: unimplemented operand
if we use function llvm.riscv.sm3p0.i32
.
FIXME: Rust requires all functions under #[target_feature]
be called in unsafe blocks, however in this case the Zks functions would be safe. How do deal with this?
r? @Amanieu
(rust-highfive has picked a reviewer for you, use r? to override)
It says:
the feature `aarch64_target_feature` has been stable since 1.61.0 and no longer requires an attribute to enable
Should I remove these stable feature gates as well?
You can probably avoid the zero-extension by sign-extending to isize
instead. On RISC-V a u32
is held in sign-extended for in a 64-bit register so this should be a no-op.
FIXME: Rust requires all functions under #[target_feature] be called in unsafe blocks, however in this case the Zks functions would be safe. How do deal with this?
It's only unsafe if the caller itself doesn't have that target feature enabled. Basically it's unsafe because you need to check that the CPU you are running on actually supports this feature. This is necessary to avoid using undefined instructions which could have unknown behavior.
@Amanieu Thanks for your suggestion! I tried to update code into transmution to i32 then use as
to isize, it emitted as follows in LLVM.
#[inline]
#[target_feature(enable = "zksh")]
pub fn sm3p0(x: u32) -> u32 {
// sign extend parameter to isize
unsafe { sm3p0_isize(transmute::<_, i32>(x) as isize) as u32 }
}
#[target_feature(enable = "zksh")]
pub fn start4(a: u32) -> u32 {
sm3p1(sm3p1(sm3p0(sm3p0(a)))) // test code generation
}
; Function Attrs: mustprogress nofree nosync nounwind readnone willreturn
define internal fastcc i32 @_ZN11hello_world6start417h1004698f151d95b4E(i32 %a) unnamed_addr #2 {
start:
%_3.i = sext i32 %a to i64
%_2.i = tail call i64 @llvm.riscv.sm3p0.i64(i64 %_3.i) #4
%sext = shl i64 %_2.i, 32
%_3.i1 = ashr exact i64 %sext, 32
%_2.i2 = tail call i64 @llvm.riscv.sm3p0.i64(i64 %_3.i1) #4
%sext7 = shl i64 %_2.i2, 32
%_3.i3 = ashr exact i64 %sext7, 32
%_2.i4 = tail call i64 @llvm.riscv.sm3p1.i64(i64 %_3.i3) #4
%sext8 = shl i64 %_2.i4, 32
%_3.i5 = ashr exact i64 %sext8, 32
%_2.i6 = tail call i64 @llvm.riscv.sm3p1.i64(i64 %_3.i5) #4
%0 = trunc i64 %_2.i6 to i32
ret i32 %0
}
The LLVM code includes sign extension from i32 to i64, however the generated code still includes an sext.w
instruction:
0000000080000008 <_ZN11hello_world6start417h1004698f151d95b4E>:
80000008: 01 25 sext.w a0, a0
8000000a: 13 15 85 10 sm3p0 a0, a0
8000000e: 01 25 sext.w a0, a0
80000010: 13 15 85 10 sm3p0 a0, a0
80000014: 01 25 sext.w a0, a0
80000016: 13 15 95 10 sm3p1 a0, a0
8000001a: 01 25 sext.w a0, a0
8000001c: 13 15 95 10 sm3p1 a0, a0
80000020: 82 80 ret
How to hint LLVM that this intrinsic function will ignore the upper 32 bits of input parameter?
It's only unsafe if the caller itself doesn't have that target feature enabled. Basically it's unsafe because you need to check that the CPU you are running on actually supports this feature. This is necessary to avoid using undefined instructions which could have unknown behavior.
Thanks! I'll note that.
You don't need the transmute, you can just do as i32 as isize
. But that probably won't fix the unnecessary sext.
How to hint LLVM that this intrinsic function will ignore the upper 32 bits of input parameter?
I don't think that is possible. This would require LLVM to recognize that the intrinsic only reads the low 32 bits.
We may wait before 32-bit RISC-V Zks intrinsic functions land in LLVM :)
:umbrella: The latest upstream changes (presumably b9cf2d74e0cdd74ee16773a8469c1885beba36a5) made this pull request unmergeable. Please resolve the merge conflicts.
Rust now uses LLVM 15, are these functions available yet?
This PR is now already done as part of #1453. I think this PR can be closed.