go icon indicating copy to clipboard operation
go copied to clipboard

cmd/compile: suboptimal zeroing of multiple registers on amd64

Open fice-t opened this issue 1 month ago • 5 comments

Go version

go version go1.26-devel_e88be8a Sun Nov 23 09:07:32 2025 -0800 linux/amd64

Output of go env in your module/workspace:

Workspace is `go.godbolt.org` on `x86-64 gc (tip)` as above.

What did you do?

[Code+output tested here]

Compile this simplified example:

var someError error

func f(w bool) (int, int, int, error) {
	if w {
		return 0, 0, 0, someError
	}
	return 1, 2, 1, nil
}

What did you see happen?

The output for 0, 0, 0 above uses XORL and two MOVQ:

        XORL    AX, AX
        MOVQ    AX, BX
        MOVQ    AX, CX

What did you expect to see?

All zeroing of registers here should use XORL.

fice-t avatar Nov 25 '25 05:11 fice-t

It's pretty minor, using MOVQ is just as fast, the register renaming unit can rename all three in one cycle even if there is a dependency. However because we encode a 64bits MOVQ it uses 3 bytes vs a 32bits op due to the 1 byte REX prefix so we should still change it.

Jorropo avatar Nov 25 '25 09:11 Jorropo

Have you measured the performance? Or is it just about the binary size? Thanks.

cherrymui avatar Nov 26 '25 22:11 cherrymui

Mainly the encoding size, but I filed this just because I noticed it while looking at assembly output for other issues.

I was also going to open a related issue to use XXXL instead of XXXQ where possible; should that instead be merged into this issue?

For example, the above's return 1, 2, 1 uses MOVQ where it could use MOVL:

        MOVL    $1, AX
        MOVL    $2, BX
        MOVQ    AX, CX  ; <-- could be MOVL as AX is 32-bit

fice-t avatar Nov 30 '25 05:11 fice-t

Using the same issue is fine.

cherrymui avatar Dec 05 '25 20:12 cherrymui