abi-aa icon indicating copy to clipboard operation
abi-aa copied to clipboard

Add condition to prevent partial application of paired AArch64 linker instruction sequence optimizations to same symbol.

Open smithp35 opened this issue 7 months ago • 0 comments

In https://github.com/llvm/llvm-project/issues/138254 a case has emerged where there are two ADRP/LDR accesses to a symbol within a function. One of these sequences can be optimised, the other cannot. The function also has a branch destination between the optimisable ADRP/LDR. This can result in the unoptimised ADRP being combined with the optimised LDR (ADD in this case) leading to incorrect results.

The assembly (derived from C code compiled with GCC 9.3)

foo:
.LFB1:
        .cfi_startproc
        cmp     x0, 0
        bge     .L8
        adrp    x2, :got:b // Optimised to adrp x2, b
.L9:
        ldr     x2, [x2, :got_lo12:b] // Optimised to adr x2, [x2 :lo12: b]
        add     x0, x2, x1
        b       bar
        .p2align 2,,3
.L8:
        stp     x29, x30, [sp, -32]!
        .cfi_def_cfa_offset 32
        .cfi_offset 29, -32
        .cfi_offset 30, -24
        adrp    x2, :got:b // Can't be optimised due to add that is in way.
        add     x0, x0, 1 // although in theory could be out of range of b.
        ldr     x3, [x2, :got_lo12:b]
        add     x0, x3, x0
        mov     x29, sp
        stp     x2, x1, [sp, 16]
        bl      bar
        ldp     x2, x1, [sp, 16]
        ldp     x29, x30, [sp], 32
        .cfi_restore 30
        .cfi_restore 29
        .cfi_def_cfa_offset 0
        b .L9

At the moment, according to our existing condition list [1] the linker is permitted to make this transformation. We need to find some words to ensure that the linker only apply the optimisation for b if all of the ADRP/LDR GOT relative accesses to b can be optimised.

The ADRP, ADD (non-got relative) transformation to NOP, ADR is saft to do partially as even if the first part of the ADRP is performed and then there's a jump to the middle of an optimised sequence, the ADR will overwrite it with the correct value.

[1] https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst#579relocation-optimization

smithp35 avatar May 06 '25 09:05 smithp35