llvm-project arm64, apple-m1, LLC generating code passing float arguments in general purpose register

Not sure if this is a bug or if we do things incorrectly.

Compiling bc code with llc to target Apple M1

clang+llvm-15.0.2-arm64-apple-darwin21.0/bin/llc --mtriple=arm64-apple-darwin20.1.0 -march=arm64 -mcpu=apple-m1 --relocation-model=pic -O0 -filetype=obj xxxxxx.bc -o xxxxxx.obj

Generates code that passes floating point argument in general purpose reg x0, for example:

mov    x0, #0xe9e2
movk   x0, #0xb295, lsl #16
movk   x0, #0x710c, lsl #32
movk   x0, #0x3fbc, lsl #48       // result: x0 = 3FBC710CB295E9E1 = 0.1111, C++ source passed a 0.1111 double literal
                                                     // also sets x1 to second arg which is a pointer
bl     0x12f15bdcc

Callee, in other ARM64 executable built with Xcode 13.4

str    d0, [sp, #0x10]          // expects first arg in d0
str    x0, [sp, #0x8]            // second arg in x0, should be the ptr, but contains the double fp value

This eventually crashes of course.

By trial and error I found that adding the flag -mattr=fp-armv8 produces code that passes the double in d0, and everything seems to work.

If found the ARM64 ABI standard: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#parameter-passing One section says

Floating-point and short vector types are passed in SIMD and Floating-point registers or on the stack; never in general-purpose registers (except when they form part of a small structure that is neither an HFA nor an HVA).

Is the llc code generation wrong? If there are different target variants, should fp-armv8 be default for triple arm64-apple and cpu apple-m1?

Or is there something else we're doing wrong?

The bitcode was produced by LLVM 7.0, btw, if that matters.

Oct 13 '22 12:10 FredrikLundkvist

@llvm/issue-subscribers-backend-aarch64

Oct 13 '22 13:10 llvmbot

Will you please attach a reproducer?

Oct 13 '22 19:10 asl

Yes. I made a simpler repro and it seems it does make a difference that the bitcode is compiled with LLVM 7.0. I'll clean it up and attach soon.

Oct 16 '22 09:10 FredrikLundkvist

Here's a repro. See comments in .sh file.

When I diffed the bc files, I noticed differences in the "attributes" string. I also tested replacing that string in the "old" bc file produced by clang 7. That made llc produce the exact same code. How do those attributes work? Are they defaults, to be overridden or matched to target cpu attributes? We have around 800 bitcode files from 3rd party developers so it's not feasible for us to modify all.

Arkiv.zip

Oct 17 '22 12:10 FredrikLundkvist

I'm getting an error when trying to use llvm-dis on that bitcode. Any way you can provide the textual IR as a repro?

Oct 17 '22 22:10 aemerson

I'm getting an error when trying to use llvm-dis on that bitcode. Any way you can provide the textual IR as a repro?

Sorry, my bad. I compiled to ll, but called them bc.

Here's an updated script compiling to bc.

I also added compile direct to ll to compare and noticed a difference when compiling to ll using clang7 compared to clang7 -> bc -> disasm to ll. The difference is in frame pointer attributes, which is also evident in the llc output adding frame pointer stuff. (But the x0/d0 argument passing issue is the same)

floatargs.zip

Oct 18 '22 12:10 FredrikLundkvist

I guess this is to be expected, and not supported, since bitcode is not portable. (Re: https://github.com/llvm/llvm-project/issues/58443)

clang7 did not support arm64/apple-m1 so it's undefined how a bitcode file from that version compiles with llc to m1?

Oct 19 '22 15:10 FredrikLundkvist

Sorry, forgot to close this.

Nov 15 '22 09:11 FredrikLundkvist