JIT assembly optimizer leaves some redundant branches
For AArch64 emit__GUARD_IS_TRUE_POP produces the following assembly code:
...
14: 54000041 b.ne 0x1c <_JIT_ENTRY+0x1c>
18: 14000002 b 0x20 <_JIT_ENTRY+0x20>
1c: 14000000 b 0x1c <_JIT_ENTRY+0x1c>
000000000000001c: R_AARCH64_JUMP26 _JIT_JUMP_TARGET
but it should, ideally, emit this:
14: 54000041 b.ne 0x14 <_JIT_ENTRY+0x14>
000000000000001c: R_AARCH64_CONDBR19 _JIT_JUMP_TARGET
This doesn't seem to be an issue for x86, possibly due to the way Clang is setup for AArch64, possibly just an artefact of llvm's code gen. Either way, rather than rely on llvm eliminating these jumps we can do it in the JIT builder's assembly optimizer. Currently we don't perform jump fusion, but it would be easy enough to do so.
Linked PRs
- gh-140800
- gh-142907
- gh-143332
- gh-143352
- gh-143389
The introduction of _remove_unreachable broke the JIT + FT builds on x86-64.
I get this error:
0. Program arguments: clang-21 --target=x86_64-unknown-linux-gnu -c -o /tmp/tmpbw0scowv/_LOAD_ATTR_WITH_HINT_r11.o /tmp/tmpbw0scowv/_LOAD_ATTR_WITH_HINT_r11.s
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
Commenting it out fixes it.
Alright, I narrowed it down: for some stencils, there are more than one entry points, e.g. if PyStackRef_CLOSE is outlined, then a stencil may have two entry points (one _JIT_ENTRY, one PyStackRef_CLOSE).
The problem with that is that we don't treat call PyStackRef_CLOSE as a jump to a live block. So this optimization breaks.
A possible fix is to just treat all function entries as live.