miri icon indicating copy to clipboard operation
miri copied to clipboard

Add support for aarch64 platform intrinsics

Open alex opened this issue 1 year ago • 8 comments

Currently this produces:

   --> /Users/alex_gaynor/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/../../stdarch/crates/core_arch/src/arm_shared/crypto.rs:69:5
    |
69  |     vaeseq_u8_(data, key)
    |     ^^^^^^^^^^^^^^^^^^^^^ can't call foreign function `llvm.aarch64.crypto.aese` on OS `macos`
    |

or

test inputs::encoded::tests::test_input ... error: unsupported operation: can't call foreign function `llvm.aarch64.neon.tbl1.v16i8` on OS `macos`
    --> /Users/dmnk/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/../../stdarch/crates/core_arch/src/aarch64/neon/mod.rs:2438:15
     |
2438 |     transmute(vqtbl1q(transmute(t), transmute(idx)))
     |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't call foreign function `llvm.aarch64.neon.tbl1.v16i8` on OS `macos`
     |

alex avatar Nov 17 '23 15:11 alex

I should probably do a version of https://github.com/rust-lang/miri/issues/2057 for aarch64, all my current surveying is done based on x86-64-v2.

saethlin avatar Nov 17 '23 15:11 saethlin

If there's a straightforward script for it (and you promise it won't destroy my computer :D), I'm happy to do a run on my ARM64 laptop.

alex avatar Nov 17 '23 15:11 alex

It involves running the tests for every published crate so I feel like you're not up for that :)

Also Miri supports cross-interpretation so the host doesn't matter, my big x86_64 CPU will do just fine for this.

saethlin avatar Nov 17 '23 15:11 saethlin

I was thinking maybe I'd just do the first 500 or 1k or something :-)

But if your setup already works for it, that sounds good!

alex avatar Nov 17 '23 15:11 alex

I hacked up https://github.com/saethlin/crater-at-home a bit to set the target to aarch64-unknown-linux-gnu and here's a thousand crates (hosted for now in my dev bucket): https://miri-bot-dev.s3.amazonaws.com/aarch64-1000.tar.xz

Missing LLVM intrinsics look like:

716 counts
(  1)      178 (24.9%, 24.9%): llvm.aarch64.neon.uminv.i32.v4i32
(  2)       62 ( 8.7%, 33.5%): llvm.aarch64.neon.uminv.i8.v16i8
(  3)       60 ( 8.4%, 41.9%): llvm.aarch64.neon.uminv.i16.v8i16
(  4)       50 ( 7.0%, 48.9%): llvm.aarch64.neon.tbl1.v16i8
(  5)       40 ( 5.6%, 54.5%): llvm.aarch64.neon.umaxp.v16i8
(  6)       26 ( 3.6%, 58.1%): llvm.aarch64.neon.ushl.v2i64
(  7)       24 ( 3.4%, 61.5%): llvm.aarch64.neon.ushl.v4i32
(  8)       18 ( 2.5%, 64.0%): llvm.aarch64.neon.uaddv.i32.v4i32
(  9)       18 ( 2.5%, 66.5%): llvm.fma.v2f64
( 10)       16 ( 2.2%, 68.7%): llvm.fma.v4f32
( 11)       14 ( 2.0%, 70.7%): llvm.aarch64.neon.sshl.v4i32
( 12)       14 ( 2.0%, 72.6%): llvm.aarch64.neon.uaddlv.i32.v16i8
( 13)       12 ( 1.7%, 74.3%): llvm.aarch64.neon.frintn.v4f32
( 14)       10 ( 1.4%, 75.7%): llvm.aarch64.neon.sshl.v8i16
( 15)        8 ( 1.1%, 76.8%): llvm.aarch64.neon.fcvtns.v4i32.v4f32
( 16)        8 ( 1.1%, 77.9%): llvm.aarch64.neon.ld1x4.v16i8.p0i8
( 17)        8 ( 1.1%, 79.1%): llvm.aarch64.neon.smin.v4i32
( 18)        8 ( 1.1%, 80.2%): llvm.aarch64.neon.smin.v8i16
( 19)        8 ( 1.1%, 81.3%): llvm.aarch64.neon.sqrdmulh.v8i16
( 20)        8 ( 1.1%, 82.4%): llvm.aarch64.neon.sshl.v2i64
( 21)        8 ( 1.1%, 83.5%): llvm.aarch64.neon.umaxv.i8.v16i8
( 22)        8 ( 1.1%, 84.6%): llvm.fptosi.sat.v4i32.v4f32
( 23)        6 ( 0.8%, 85.5%): llvm.aarch64.neon.uaddv.i32.v8i16
( 24)        4 ( 0.6%, 86.0%): llvm.aarch64.neon.abs.v16i8
( 25)        4 ( 0.6%, 86.6%): llvm.aarch64.neon.abs.v4i32
( 26)        4 ( 0.6%, 87.2%): llvm.aarch64.neon.abs.v8i16
( 27)        4 ( 0.6%, 87.7%): llvm.aarch64.neon.fmax.v2f64
( 28)        4 ( 0.6%, 88.3%): llvm.aarch64.neon.fmax.v4f32
( 29)        4 ( 0.6%, 88.8%): llvm.aarch64.neon.fmaxnm.v2f64
( 30)        4 ( 0.6%, 89.4%): llvm.aarch64.neon.fmaxnm.v4f32
( 31)        4 ( 0.6%, 89.9%): llvm.aarch64.neon.fmin.v2f64
( 32)        4 ( 0.6%, 90.5%): llvm.aarch64.neon.fmin.v4f32
( 33)        4 ( 0.6%, 91.1%): llvm.aarch64.neon.fminnm.v2f64
( 34)        4 ( 0.6%, 91.6%): llvm.aarch64.neon.fminnm.v4f32
( 35)        4 ( 0.6%, 92.2%): llvm.aarch64.neon.smax.v16i8
( 36)        4 ( 0.6%, 92.7%): llvm.aarch64.neon.smax.v8i16
( 37)        4 ( 0.6%, 93.3%): llvm.aarch64.neon.smin.v16i8
( 38)        4 ( 0.6%, 93.9%): llvm.aarch64.neon.smull.v4i16
( 39)        4 ( 0.6%, 94.4%): llvm.aarch64.neon.sqadd.v16i8
( 40)        4 ( 0.6%, 95.0%): llvm.aarch64.neon.sqadd.v8i16
( 41)        4 ( 0.6%, 95.5%): llvm.aarch64.neon.sqsub.v16i8
( 42)        4 ( 0.6%, 96.1%): llvm.aarch64.neon.sqsub.v8i16
( 43)        4 ( 0.6%, 96.6%): llvm.aarch64.neon.ushl.v8i16
( 44)        2 ( 0.3%, 96.9%): llvm.aarch64.neon.sqxtn.v4i16
( 45)        2 ( 0.3%, 97.2%): llvm.aarch64.neon.sqxtn.v8i8
( 46)        2 ( 0.3%, 97.5%): llvm.aarch64.neon.umax.v16i8
( 47)        2 ( 0.3%, 97.8%): llvm.aarch64.neon.umax.v4i32
( 48)        2 ( 0.3%, 98.0%): llvm.aarch64.neon.umax.v8i16
( 49)        2 ( 0.3%, 98.3%): llvm.aarch64.neon.umin.v16i8
( 50)        2 ( 0.3%, 98.6%): llvm.aarch64.neon.umin.v4i32
( 51)        2 ( 0.3%, 98.9%): llvm.aarch64.neon.umin.v8i16
( 52)        2 ( 0.3%, 99.2%): llvm.aarch64.neon.uqadd.v16i8
( 53)        2 ( 0.3%, 99.4%): llvm.aarch64.neon.uqadd.v8i16
( 54)        2 ( 0.3%, 99.7%): llvm.aarch64.neon.uqsub.v16i8
( 55)        2 ( 0.3%,100.0%): llvm.aarch64.neon.uqsub.v8i16

Nothing about AES. Do I need a particular target-cpu set to have AES intrinsics? Or is there a specific crate you were working on above that I can test target CPUs to see which one hits this?

saethlin avatar Nov 17 '23 20:11 saethlin

https://github.com/ogxd/gxhash is what I was playing with when I originally ran into this.

https://github.com/RustCrypto/block-ciphers/tree/master/aes uses the same instruction, but goes via inline assembly instead of the intrinsic, for whatever reason.

alex avatar Nov 17 '23 20:11 alex

In any event, thanks for running these numbers!

I'm using an Apple M1, which will have a set of baseline capabilities that I'm not sure is guaranteed for all aarch64 chips.

alex avatar Nov 17 '23 20:11 alex

Baseline aarch64 does not have the aes feature, but I think here https://github.com/rust-lang/rust/issues/93889#issuecomment-1172693546 @workingjubilee says that M1 is target-cpu=apple-a14. I'll try that, it has the aes feature.

saethlin avatar Nov 17 '23 20:11 saethlin