miri
miri copied to clipboard
Add support for aarch64 platform intrinsics
Currently this produces:
--> /Users/alex_gaynor/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/../../stdarch/crates/core_arch/src/arm_shared/crypto.rs:69:5
|
69 | vaeseq_u8_(data, key)
| ^^^^^^^^^^^^^^^^^^^^^ can't call foreign function `llvm.aarch64.crypto.aese` on OS `macos`
|
or
test inputs::encoded::tests::test_input ... error: unsupported operation: can't call foreign function `llvm.aarch64.neon.tbl1.v16i8` on OS `macos`
--> /Users/dmnk/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/../../stdarch/crates/core_arch/src/aarch64/neon/mod.rs:2438:15
|
2438 | transmute(vqtbl1q(transmute(t), transmute(idx)))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't call foreign function `llvm.aarch64.neon.tbl1.v16i8` on OS `macos`
|
I should probably do a version of https://github.com/rust-lang/miri/issues/2057 for aarch64, all my current surveying is done based on x86-64-v2.
If there's a straightforward script for it (and you promise it won't destroy my computer :D), I'm happy to do a run on my ARM64 laptop.
It involves running the tests for every published crate so I feel like you're not up for that :)
Also Miri supports cross-interpretation so the host doesn't matter, my big x86_64 CPU will do just fine for this.
I was thinking maybe I'd just do the first 500 or 1k or something :-)
But if your setup already works for it, that sounds good!
I hacked up https://github.com/saethlin/crater-at-home a bit to set the target to aarch64-unknown-linux-gnu and here's a thousand crates (hosted for now in my dev bucket): https://miri-bot-dev.s3.amazonaws.com/aarch64-1000.tar.xz
Missing LLVM intrinsics look like:
716 counts
( 1) 178 (24.9%, 24.9%): llvm.aarch64.neon.uminv.i32.v4i32
( 2) 62 ( 8.7%, 33.5%): llvm.aarch64.neon.uminv.i8.v16i8
( 3) 60 ( 8.4%, 41.9%): llvm.aarch64.neon.uminv.i16.v8i16
( 4) 50 ( 7.0%, 48.9%): llvm.aarch64.neon.tbl1.v16i8
( 5) 40 ( 5.6%, 54.5%): llvm.aarch64.neon.umaxp.v16i8
( 6) 26 ( 3.6%, 58.1%): llvm.aarch64.neon.ushl.v2i64
( 7) 24 ( 3.4%, 61.5%): llvm.aarch64.neon.ushl.v4i32
( 8) 18 ( 2.5%, 64.0%): llvm.aarch64.neon.uaddv.i32.v4i32
( 9) 18 ( 2.5%, 66.5%): llvm.fma.v2f64
( 10) 16 ( 2.2%, 68.7%): llvm.fma.v4f32
( 11) 14 ( 2.0%, 70.7%): llvm.aarch64.neon.sshl.v4i32
( 12) 14 ( 2.0%, 72.6%): llvm.aarch64.neon.uaddlv.i32.v16i8
( 13) 12 ( 1.7%, 74.3%): llvm.aarch64.neon.frintn.v4f32
( 14) 10 ( 1.4%, 75.7%): llvm.aarch64.neon.sshl.v8i16
( 15) 8 ( 1.1%, 76.8%): llvm.aarch64.neon.fcvtns.v4i32.v4f32
( 16) 8 ( 1.1%, 77.9%): llvm.aarch64.neon.ld1x4.v16i8.p0i8
( 17) 8 ( 1.1%, 79.1%): llvm.aarch64.neon.smin.v4i32
( 18) 8 ( 1.1%, 80.2%): llvm.aarch64.neon.smin.v8i16
( 19) 8 ( 1.1%, 81.3%): llvm.aarch64.neon.sqrdmulh.v8i16
( 20) 8 ( 1.1%, 82.4%): llvm.aarch64.neon.sshl.v2i64
( 21) 8 ( 1.1%, 83.5%): llvm.aarch64.neon.umaxv.i8.v16i8
( 22) 8 ( 1.1%, 84.6%): llvm.fptosi.sat.v4i32.v4f32
( 23) 6 ( 0.8%, 85.5%): llvm.aarch64.neon.uaddv.i32.v8i16
( 24) 4 ( 0.6%, 86.0%): llvm.aarch64.neon.abs.v16i8
( 25) 4 ( 0.6%, 86.6%): llvm.aarch64.neon.abs.v4i32
( 26) 4 ( 0.6%, 87.2%): llvm.aarch64.neon.abs.v8i16
( 27) 4 ( 0.6%, 87.7%): llvm.aarch64.neon.fmax.v2f64
( 28) 4 ( 0.6%, 88.3%): llvm.aarch64.neon.fmax.v4f32
( 29) 4 ( 0.6%, 88.8%): llvm.aarch64.neon.fmaxnm.v2f64
( 30) 4 ( 0.6%, 89.4%): llvm.aarch64.neon.fmaxnm.v4f32
( 31) 4 ( 0.6%, 89.9%): llvm.aarch64.neon.fmin.v2f64
( 32) 4 ( 0.6%, 90.5%): llvm.aarch64.neon.fmin.v4f32
( 33) 4 ( 0.6%, 91.1%): llvm.aarch64.neon.fminnm.v2f64
( 34) 4 ( 0.6%, 91.6%): llvm.aarch64.neon.fminnm.v4f32
( 35) 4 ( 0.6%, 92.2%): llvm.aarch64.neon.smax.v16i8
( 36) 4 ( 0.6%, 92.7%): llvm.aarch64.neon.smax.v8i16
( 37) 4 ( 0.6%, 93.3%): llvm.aarch64.neon.smin.v16i8
( 38) 4 ( 0.6%, 93.9%): llvm.aarch64.neon.smull.v4i16
( 39) 4 ( 0.6%, 94.4%): llvm.aarch64.neon.sqadd.v16i8
( 40) 4 ( 0.6%, 95.0%): llvm.aarch64.neon.sqadd.v8i16
( 41) 4 ( 0.6%, 95.5%): llvm.aarch64.neon.sqsub.v16i8
( 42) 4 ( 0.6%, 96.1%): llvm.aarch64.neon.sqsub.v8i16
( 43) 4 ( 0.6%, 96.6%): llvm.aarch64.neon.ushl.v8i16
( 44) 2 ( 0.3%, 96.9%): llvm.aarch64.neon.sqxtn.v4i16
( 45) 2 ( 0.3%, 97.2%): llvm.aarch64.neon.sqxtn.v8i8
( 46) 2 ( 0.3%, 97.5%): llvm.aarch64.neon.umax.v16i8
( 47) 2 ( 0.3%, 97.8%): llvm.aarch64.neon.umax.v4i32
( 48) 2 ( 0.3%, 98.0%): llvm.aarch64.neon.umax.v8i16
( 49) 2 ( 0.3%, 98.3%): llvm.aarch64.neon.umin.v16i8
( 50) 2 ( 0.3%, 98.6%): llvm.aarch64.neon.umin.v4i32
( 51) 2 ( 0.3%, 98.9%): llvm.aarch64.neon.umin.v8i16
( 52) 2 ( 0.3%, 99.2%): llvm.aarch64.neon.uqadd.v16i8
( 53) 2 ( 0.3%, 99.4%): llvm.aarch64.neon.uqadd.v8i16
( 54) 2 ( 0.3%, 99.7%): llvm.aarch64.neon.uqsub.v16i8
( 55) 2 ( 0.3%,100.0%): llvm.aarch64.neon.uqsub.v8i16
Nothing about AES. Do I need a particular target-cpu
set to have AES intrinsics? Or is there a specific crate you were working on above that I can test target CPUs to see which one hits this?
https://github.com/ogxd/gxhash is what I was playing with when I originally ran into this.
https://github.com/RustCrypto/block-ciphers/tree/master/aes uses the same instruction, but goes via inline assembly instead of the intrinsic, for whatever reason.
In any event, thanks for running these numbers!
I'm using an Apple M1, which will have a set of baseline capabilities that I'm not sure is guaranteed for all aarch64 chips.
Baseline aarch64 does not have the aes feature, but I think here https://github.com/rust-lang/rust/issues/93889#issuecomment-1172693546 @workingjubilee says that M1 is target-cpu=apple-a14
. I'll try that, it has the aes
feature.