arm64, apple-m1, LLC generating code passing float arguments in general purpose register
Not sure if this is a bug or if we do things incorrectly.
Compiling bc code with llc to target Apple M1
clang+llvm-15.0.2-arm64-apple-darwin21.0/bin/llc --mtriple=arm64-apple-darwin20.1.0 -march=arm64 -mcpu=apple-m1 --relocation-model=pic -O0 -filetype=obj xxxxxx.bc -o xxxxxx.obj
Generates code that passes floating point argument in general purpose reg x0, for example:
mov x0, #0xe9e2
movk x0, #0xb295, lsl #16
movk x0, #0x710c, lsl #32
movk x0, #0x3fbc, lsl #48 // result: x0 = 3FBC710CB295E9E1 = 0.1111, C++ source passed a 0.1111 double literal
// also sets x1 to second arg which is a pointer
bl 0x12f15bdcc
Callee, in other ARM64 executable built with Xcode 13.4
str d0, [sp, #0x10] // expects first arg in d0
str x0, [sp, #0x8] // second arg in x0, should be the ptr, but contains the double fp value
This eventually crashes of course.
By trial and error I found that adding the flag -mattr=fp-armv8 produces code that passes the double in d0, and everything seems to work.
If found the ARM64 ABI standard: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#parameter-passing One section says
Floating-point and short vector types are passed in SIMD and Floating-point registers or on the stack; never in general-purpose registers (except when they form part of a small structure that is neither an HFA nor an HVA).
Is the llc code generation wrong? If there are different target variants, should fp-armv8 be default for triple arm64-apple and cpu apple-m1?
Or is there something else we're doing wrong?
The bitcode was produced by LLVM 7.0, btw, if that matters.
@llvm/issue-subscribers-backend-aarch64
Will you please attach a reproducer?
Yes. I made a simpler repro and it seems it does make a difference that the bitcode is compiled with LLVM 7.0. I'll clean it up and attach soon.
Here's a repro. See comments in .sh file.
When I diffed the bc files, I noticed differences in the "attributes" string. I also tested replacing that string in the "old" bc file produced by clang 7. That made llc produce the exact same code. How do those attributes work? Are they defaults, to be overridden or matched to target cpu attributes? We have around 800 bitcode files from 3rd party developers so it's not feasible for us to modify all.
I'm getting an error when trying to use llvm-dis on that bitcode. Any way you can provide the textual IR as a repro?
I'm getting an error when trying to use
llvm-dison that bitcode. Any way you can provide the textual IR as a repro?
Sorry, my bad. I compiled to ll, but called them bc.
Here's an updated script compiling to bc.
I also added compile direct to ll to compare and noticed a difference when compiling to ll using clang7 compared to clang7 -> bc -> disasm to ll. The difference is in frame pointer attributes, which is also evident in the llc output adding frame pointer stuff. (But the x0/d0 argument passing issue is the same)
I guess this is to be expected, and not supported, since bitcode is not portable. (Re: https://github.com/llvm/llvm-project/issues/58443)
clang7 did not support arm64/apple-m1 so it's undefined how a bitcode file from that version compiles with llc to m1?
Sorry, forgot to close this.