cmd/compile: suboptimal zeroing of multiple registers on amd64
Go version
go version go1.26-devel_e88be8a Sun Nov 23 09:07:32 2025 -0800 linux/amd64
Output of go env in your module/workspace:
Workspace is `go.godbolt.org` on `x86-64 gc (tip)` as above.
What did you do?
[Code+output tested here]
Compile this simplified example:
var someError error
func f(w bool) (int, int, int, error) {
if w {
return 0, 0, 0, someError
}
return 1, 2, 1, nil
}
What did you see happen?
The output for 0, 0, 0 above uses XORL and two MOVQ:
XORL AX, AX
MOVQ AX, BX
MOVQ AX, CX
What did you expect to see?
All zeroing of registers here should use XORL.
Related Issues
- cmd/compile: on `AMD64` slow instruction sequence to Zext a < register size instruction output #76066
- cmd/compile: prefer to cheaply rematerialize than copy registers #24132 (closed)
- cmd/compile: eliminate redundant zeroing after lower pass #47107 (closed)
- cmd/compile: unnecessary zeroing of register on arm #8474 (closed)
- cmd/compile: unnecessary instructions generated #29892 (closed)
- cmd/compile: double zeroing and unnecessary copying/stack use #67957 (closed)
- cmd/compile: omit zeroing of named return value when possible #4750
- cmd/compile: small struct initialization code is suboptimal because of redundant zeroing #59021 (closed)
- cmd/internal/obj: remove MOV $0 -> XOR "optimization" #22325 (closed)
Related Discussions
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
It's pretty minor, using MOVQ is just as fast, the register renaming unit can rename all three in one cycle even if there is a dependency.
However because we encode a 64bits MOVQ it uses 3 bytes vs a 32bits op due to the 1 byte REX prefix so we should still change it.
Have you measured the performance? Or is it just about the binary size? Thanks.
Mainly the encoding size, but I filed this just because I noticed it while looking at assembly output for other issues.
I was also going to open a related issue to use XXXL instead of XXXQ where possible; should that instead be merged into this issue?
For example, the above's return 1, 2, 1 uses MOVQ where it could use MOVL:
MOVL $1, AX
MOVL $2, BX
MOVQ AX, CX ; <-- could be MOVL as AX is 32-bit
Using the same issue is fine.