stdarch icon indicating copy to clipboard operation
stdarch copied to clipboard

Modify Zks functions to use LLVM intrinsics

Open luojia65 opened this issue 2 years ago • 7 comments

This pull request replaces use of inline assembly into LLVM intrinsics for RISC-V Zks functions. This change would enable optimizations by using LLVM intrinsics. This pull request also adds a target feature attribute on each Zks function.

The Zks functions are now gated under RISC-V Rust target feature: 图片

Use of as cast: RISC-V Zks instructions does not use upper 32 bits of parameter and return value (we sign extened it as parameter, and truncate from return value). Ref: https://github.com/riscv/riscv-crypto/blob/master/doc/scalar/insns/sm3p0.adoc#Operation

Compile output:

; core_arch::core_arch::riscv_shared::sm3p0
; Function Attrs: inlinehint mustprogress nofree nosync nounwind readnone willreturn
define internal fastcc i32 @_ZN9core_arch9core_arch12riscv_shared5sm3p017h50602641902a2471E(i32 %x) unnamed_addr #0 {
start:
  %_3 = zext i32 %x to i64
  %_2 = tail call i64 @llvm.riscv.sm3p0.i64(i64 %_3) #5
  %0 = trunc i64 %_2 to i32
  ret i32 %0
}
000000008000004e <_ZN9core_arch9core_arch12riscv_shared5sm3p017h50602641902a2471E>:
    8000004e:	1502                	slli	a0,a0,0x20
    80000050:	9101                	srli	a0,a0,0x20
    80000052:	10851513          	sm3p0	a0,a0
    80000056:	8082                	ret

0000000080000058 <_ZN9core_arch9core_arch12riscv_shared5sm3p117h9f0bc8b014c2d03aE>:
    80000058:	1502                	slli	a0,a0,0x20
    8000005a:	9101                	srli	a0,a0,0x20
    8000005c:	10951513          	sm3p1	a0,a0
    80000060:	8082                	ret

Current LLVM does not have llvm.riscv.sm3p0.i32 on 64-bit machine, it would result in LLVM ERROR: unimplemented operand if we use function llvm.riscv.sm3p0.i32.

FIXME: Rust requires all functions under #[target_feature] be called in unsafe blocks, however in this case the Zks functions would be safe. How do deal with this?

luojia65 avatar Mar 21 '22 09:03 luojia65

r? @Amanieu

(rust-highfive has picked a reviewer for you, use r? to override)

rust-highfive avatar Mar 21 '22 09:03 rust-highfive

It says:

the feature `aarch64_target_feature` has been stable since 1.61.0 and no longer requires an attribute to enable

Should I remove these stable feature gates as well?

luojia65 avatar Mar 21 '22 09:03 luojia65

You can probably avoid the zero-extension by sign-extending to isize instead. On RISC-V a u32 is held in sign-extended for in a 64-bit register so this should be a no-op.

FIXME: Rust requires all functions under #[target_feature] be called in unsafe blocks, however in this case the Zks functions would be safe. How do deal with this?

It's only unsafe if the caller itself doesn't have that target feature enabled. Basically it's unsafe because you need to check that the CPU you are running on actually supports this feature. This is necessary to avoid using undefined instructions which could have unknown behavior.

Amanieu avatar Mar 21 '22 19:03 Amanieu

@Amanieu Thanks for your suggestion! I tried to update code into transmution to i32 then use as to isize, it emitted as follows in LLVM.

#[inline]
#[target_feature(enable = "zksh")]
pub fn sm3p0(x: u32) -> u32 {
    // sign extend parameter to isize
    unsafe { sm3p0_isize(transmute::<_, i32>(x) as isize) as u32 }
}
#[target_feature(enable = "zksh")]
pub fn start4(a: u32) -> u32 {
    sm3p1(sm3p1(sm3p0(sm3p0(a)))) // test code generation
}
; Function Attrs: mustprogress nofree nosync nounwind readnone willreturn
define internal fastcc i32 @_ZN11hello_world6start417h1004698f151d95b4E(i32 %a) unnamed_addr #2 {
start:
  %_3.i = sext i32 %a to i64
  %_2.i = tail call i64 @llvm.riscv.sm3p0.i64(i64 %_3.i) #4
  %sext = shl i64 %_2.i, 32
  %_3.i1 = ashr exact i64 %sext, 32
  %_2.i2 = tail call i64 @llvm.riscv.sm3p0.i64(i64 %_3.i1) #4
  %sext7 = shl i64 %_2.i2, 32
  %_3.i3 = ashr exact i64 %sext7, 32
  %_2.i4 = tail call i64 @llvm.riscv.sm3p1.i64(i64 %_3.i3) #4
  %sext8 = shl i64 %_2.i4, 32
  %_3.i5 = ashr exact i64 %sext8, 32
  %_2.i6 = tail call i64 @llvm.riscv.sm3p1.i64(i64 %_3.i5) #4
  %0 = trunc i64 %_2.i6 to i32
  ret i32 %0
}

The LLVM code includes sign extension from i32 to i64, however the generated code still includes an sext.w instruction:

0000000080000008 <_ZN11hello_world6start417h1004698f151d95b4E>:
80000008: 01 25        	sext.w	a0, a0
8000000a: 13 15 85 10  	sm3p0	a0, a0
8000000e: 01 25        	sext.w	a0, a0
80000010: 13 15 85 10  	sm3p0	a0, a0
80000014: 01 25        	sext.w	a0, a0
80000016: 13 15 95 10  	sm3p1	a0, a0
8000001a: 01 25        	sext.w	a0, a0
8000001c: 13 15 95 10  	sm3p1	a0, a0
80000020: 82 80        	ret

How to hint LLVM that this intrinsic function will ignore the upper 32 bits of input parameter?

It's only unsafe if the caller itself doesn't have that target feature enabled. Basically it's unsafe because you need to check that the CPU you are running on actually supports this feature. This is necessary to avoid using undefined instructions which could have unknown behavior.

Thanks! I'll note that.

luojia65 avatar Mar 22 '22 05:03 luojia65

You don't need the transmute, you can just do as i32 as isize. But that probably won't fix the unnecessary sext.

How to hint LLVM that this intrinsic function will ignore the upper 32 bits of input parameter?

I don't think that is possible. This would require LLVM to recognize that the intrinsic only reads the low 32 bits.

Amanieu avatar Mar 22 '22 21:03 Amanieu

We may wait before 32-bit RISC-V Zks intrinsic functions land in LLVM :)

luojia65 avatar Mar 26 '22 08:03 luojia65

:umbrella: The latest upstream changes (presumably b9cf2d74e0cdd74ee16773a8469c1885beba36a5) made this pull request unmergeable. Please resolve the merge conflicts.

bors avatar Sep 13 '22 03:09 bors

Rust now uses LLVM 15, are these functions available yet?

Amanieu avatar Oct 25 '22 16:10 Amanieu

This PR is now already done as part of #1453. I think this PR can be closed.

coastalwhite avatar Sep 16 '23 16:09 coastalwhite