ir
ir copied to clipboard
Some IR tests still fail on MacOS/AArch64
Hi @dstogov,
I noticed that you had fixed the ir_add_veneer()
assertion failure in https://github.com/dstogov/ir/commit/1671b3de78742b85611f0e84f805141608730dfc. After your fix, there is no more assertion failure, but below IR tests still fail on MacOS/AArch64.
001: Argument Passing [./tests/debug.aarch64/args_001.irt]
002: Argument Passing [./tests/debug.aarch64/args_002.irt]
Simple CALL -O0 [./tests/debug.aarch64/call-O0.irt]
Simple CALL [./tests/debug.aarch64/call.irt]
CALL with parallel argument passing [./tests/debug.aarch64/call2.irt]
Simple CALL with ALLOCA [./tests/debug.aarch64/call_alloca.irt]
Simple CALL with VADDR [./tests/debug.aarch64/call_vaddr.irt]
003: Parameter Loading and argument passing [./tests/debug.aarch64/params_003.irt]
Fib [./tests/debug.aarch64/regset-fib.irt]
Fib2 [./tests/debug.aarch64/regset-fib2.irt]
FibI [./tests/debug.aarch64/regset-fibi.irt]
Simple CALL [./tests/debug.aarch64/tailcall_001.irt]
FibI [./tests/fibi_min.irt]
CTLZ 001: [./tests/run/ctlz_001.irt]
CTPOP 001: [./tests/run/ctpop_001.irt]
CTTZ 001: [./tests/run/cttz_001.irt]
Floating Point number comparison (001: CMP edge cases) [./tests/run/fcmp_001.irt]
Floating Point number comparison (002: CMP+COND edge cases) [./tests/run/fcmp_002.irt]
VA_ARG 001: va_arg(int) [./tests/run/vaarg_001.irt]
VA_ARG 002: va_arg(double) [./tests/run/vaarg_002.irt]
VA_ARG 003: va_arg(float) [./tests/run/vaarg_003.irt]
VA_ARG 004: va_arg() expanded on AArch64 [./tests/run/vaarg_004_aarch64.irt]
The failures are caused by the difference between .exp
and .out
, for example ./tests/debug.aarch64/call.irt
call.out
main:
stp x29, x30, [sp, #-0x10]!
mov x29, sp
adr x0, .L1
movz w1, #0x2a
movz x17, #0x9f54
movk x17, #0x8d06, lsl #16
movk x17, #0x1, lsl #32
blr x17
ldp x29, x30, [sp], #0x10
ret
.rodata
.L1:
.db 0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x25, 0x64, 0x21, 0x0a, 0x00, 0x00
hello 1799882064!
exit code = 18
call.exp
main:
stp x29, x30, [sp, #-0x10]!
mov x29, sp
adr x0, .L1
movz w1, #0x2a
bl printf
ldp x29, x30, [sp], #0x10
ret
.rodata
.db 0x1f, 0x20, 0x03, 0xd5
.L1:
.db 0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x25, 0x64, 0x21, 0x0a, 0x00, 0x00
hello 42!
exit code = 10
It looks that the issues are still related to calling convention differences between Linux/AArch64 and MacOS/AArch64. Perhaps you are still working on some complete fixes for this. But I'd like to add this issue here to track this.
Thanks for the report. The vararg passing on MacOS/AArch64 is not fixed yet.
However this failure is not related to calling convention and it is not a serious problem. The difference is in usage of blr
instead of bl
that is caused by the larger distance between JIT code buffer and printf
.
Probably we may always generate bl
and relay on additional veneer when necessary. I'll think about this.
I'm wrong. This test uses expectation that is incorrect on MacOS. The third argument must be passed through stack and the result of the execution is incorrect. This is directly related to vararg passing.
@pfustc could you please re-run the tests on MacOS/AArch64 once again
I hope I fixed vararg passing via https://github.com/dstogov/ir/commit/c5bf8003c0f3aeaeef0700e17c44e137627758c8
Tests that verifies assembler code are going to be failed, but the output generated by JIT code must be the same (e.g. hello 42!
)
Thanks for your quick fix. We now have 12 failures, 11 of which are just caused by the diff in assembly code.
-------------------------------
FAILED TESTS
-------------------------------
001: Argument Passing [./tests/debug.aarch64/args_001.irt]
002: Argument Passing [./tests/debug.aarch64/args_002.irt]
Simple CALL -O0 [./tests/debug.aarch64/call-O0.irt]
Simple CALL [./tests/debug.aarch64/call.irt]
CALL with parallel argument passing [./tests/debug.aarch64/call2.irt]
Simple CALL with ALLOCA [./tests/debug.aarch64/call_alloca.irt]
Simple CALL with VADDR [./tests/debug.aarch64/call_vaddr.irt]
003: Parameter Loading and argument passing [./tests/debug.aarch64/params_003.irt]
Fib [./tests/debug.aarch64/regset-fib.irt]
Fib2 [./tests/debug.aarch64/regset-fib2.irt]
FibI [./tests/debug.aarch64/regset-fibi.irt]
Simple CALL [./tests/debug.aarch64/tailcall_001.irt]
-------------------------------
But the last one ./tests/debug.aarch64/tailcall_001.irt
still has output difference.
$ diff tailcall_001.out tailcall_001.exp
2d1
< sub sp, sp, #0x10
4,16c3,4
< movz w16, #0x2a
< str x16, [sp]
< sub sp, sp, #0x10
< adr x0, .L1
< movz w16, #0x2a
< str x16, [sp]
< movz x17, #0x9f54
< movk x17, #0x8857, lsl #16
< movk x17, #0x1, lsl #32
< blr x17
< add sp, sp, #0x10
< add sp, sp, #0x10
< ret
---
> movz w1, #0x2a
> b printf
21a10,12
> hello 42!
>
> exit code = 10
There is no hello 42!
printed in tailcall_001.out
.
Great! Thanks!
According ./tests/debug.aarch64/tailcall_001.irt
- we probably can't use TAILCALL
because prinrf()
accepts stack parameters. Actually, the generated code tries to convert TAILCALL
to regular CALL
, but it seem done improperly.
- the link register is not saved restored
- 42 passed on stack twice
-
add sp, sp, #0x10
is done twice
The other test failures most probably are false positives and may be avoided by separating target apple-aarch64. The other problem is bl/blr mismatch caused by different code layout.
The output for ./tests/debug.aarch64/tailcall_001.irt
should be fixed by https://github.com/dstogov/ir/commit/81fab9a2cbaaff0fdef7c4c93f5ccf4154278a43
I can't test this.
Just checked that there's no other failures, except those caused by the assembly difference. Thanks for it.
The other test failures most probably are false positives and may be avoided by separating target apple-aarch64. The other problem is bl/blr mismatch caused by different code layout.
Is the assembly difference eventually caused by the different behavior of DynASM on Linux and MacOS? I'd like to know if it can be avoided by replacing current IR backend to something else.
Is the assembly difference eventually caused by the different behavior of DynASM on Linux and MacOS? I'd like to know if it can be avoided by replacing current IR backend to something else.
There are two reasons:
- the difference in calling conventions (e.g. vararg passing) and different code generated on purpose. We can't avoid this (because of similar reason we made difference between
x86_64
andWindows-x86_64
targets. Probably, we will have to introduceapple-aarch64
). - the difference in jump distance and usage of
bl
orblr
instruction (this might be avoided by usage of reallocations or veneers).
Oh yes, I understand that. Thanks for your explanation.