go.arm64 liblink, cmd/7l: floating point immediates are loaded through indirection

Jan 12 '15 18:01 4ad

I guess you really mean to use FMOV (scalar, immediate) instruction to load FP immediates?

Because liblink currently rewrites FP immediate loads to memory loads correctly ($f32.xxxx, $f64.xxxx symbols) in progedit function.

Jan 12 '15 22:01 minux

Yes, but there's nothing in memory there.

Jan 12 '15 22:01 4ad

Interesting. I think liblink should generate the data when reading the object file.

see: https://github.com/4ad/go/blob/dev.arm64/src/liblink/objfile.c#L793

liblink/obj[568].c synthesize the constant during assembly, but i think that's redundant and wastes space.

Jan 12 '15 22:01 minux

I'm confused as to what's going on here, but I think it's working.

Compiling this stupid program with GOARCH=arm64 go build:

package main

func main() {
    f()
}

func f() float64 {
    return 12342432.0
}

Results in an executable with a main.main that looks like this per aarch64-linux-gnu-objdump (it has e inlined into it, unsurprisingly):

0000000000010c00 <main.main>:
   10c00:       580000c1        ldr     x1, 10c18 <main.main+0x18>
   10c04:       fd400027        ldr     d7, [x1]
   10c08:       580000c1        ldr     x1, 10c20 <main.main+0x20>
   10c0c:       fd400027        ldr     d7, [x1]
   10c10:       d65f03c0        ret
   10c14:       14000000        b       10c14 <main.main+0x14>
   10c18:       00011018        .inst   0x00011018 ; undefined
   10c1c:       00000000        .inst   0x00000000 ; undefined
   10c20:       00011020        .inst   0x00011020 ; undefined
        ...

So this isn't loading the float from rodata directly, it's loading an address from which to load the float. Twice! And from addresses two bytes apart? But it looks like it's getting the right value in the end, here's the .rodata section:

mwhudson@narsil:go-test-cases$ aarch64-linux-gnu-objdump -j .rodata -s fpimmed 

fpimmed:     file format elf64-littleaarch64

Contents of section .rodata:
 11000 01000000 00000000 01010104 01010104  ................
 11010 01000000 00000000 00000000 00000000  ................
 11020 00000000 948a6741                    ......gA

The value at 11020 does represent the correct float:

>>> [hex(ord(c)) for c in struct.pack('d', 12342432.0)]
['0x0', '0x0', '0x0', '0x0', '0x94', '0x8a', '0x67', '0x41']

So my conclusion is that this is weird but not broken.

Jan 13 '15 00:01 mwhudson

On 13 January 2015 at 00:26, Michael Hudson-Doyle [email protected] wrote:

So this isn't loading the float from rodata directly, it's loading an address from which to load the float. Twice! And from addresses two bytes apart?

It does look wrong to have the two loads, and there doesn't seem to be a need for indirection that way. I think the addresses are 8 bytes apart, though, because it's hex.

Jan 13 '15 00:01 forsyth

Haha, oops, you're right of course about the addresses. So only two bogosities.

Jan 13 '15 01:01 mwhudson

cmd/5g also does it that way. two indirections to get a general FP constant (that can't use vmov #imm form)

It depends on where do we put the floating point constants.

If we put them into RODATA, so that there is only one copy of the constant for the whole program, that is the correct instruction sequence to load constants.

On the other hand, if we put the fp constants in constant pools, then we can save one of the loads, however, that means we don't benefit from data merging by the linker, and the same constants might be saved again and again in the binary.

Ideally, for ARM32, we can afford to save float32s into constant pools, but for ARM64, as addresses are 64-bit anyway, always saving FP constants into constants pools is more appealing. (but that's an optimization that we can do later, I just tried, it's not a few lines of change.)

Jan 13 '15 01:01 minux

We could very easily put the immediates in the pool. However, I am tempted not to do this. Rather, I'd switch to a different relocation scheme. Right now we only do R_ADDR and R_CALLARM64. We could easily add R_ADDRARM64 which would affect the mov target itself, rather than the pool literal. This is assuming we have enough bits to address the data segment. But I'm sure we have. Even if we don't have, I'll just add the static base register back, pointing to the data segment.

Jan 29 '15 12:01 4ad

No, we don't have enough bits, not even with static base register. The end result will take just as much space and will be just as efficient. I'm going to move floating points in the pool.

Jan 29 '15 12:01 4ad

go.arm64 go.arm64 copied to clipboard

liblink, cmd/7l: floating point immediates are loaded through indirection

go.arm64
go.arm64 copied to clipboard