go
go copied to clipboard
cmd/compile: "redo growslice calling convention" caused a 14.7% regression in AppendMsgReplicateDecision
We should determine if this is worth fixing, if the benchmark is too noisy (or not really useful), or if there's a real bug here.
https://perf.golang.org/dashboard/?benchmark=AppendMsgReplicateDecision-8&days=90&end=2022-10-26T19%3A24
Definitely not a great benchmark. With a sub-1ns running time, lots of noise. I think we can probably get rid of it. (Anyone know how to do that? Where's the config of what is run?) @dr2chase
That said, it does look like there is something happening here. The inner loop looks like it has an extra bounds check in it. That may be fixable, I'd need to extract a repro from this giant repository first.
I can reproduce with:
package p
import (
"testing"
)
func BenchmarkAppendMsgReplicateDecision(b *testing.B) {
v := ReplicateDecision{}
bts := make([]byte, 0, v.Msgsize())
bts, _ = v.MarshalMsg(bts[0:0])
b.SetBytes(int64(len(bts)))
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
bts, _ = v.MarshalMsg(bts[0:0])
}
}
func (z ReplicateDecision) MarshalMsg(b []byte) (o []byte, err error) {
o = Require(b, z.Msgsize())
// map header, size 0
o = append(o, 0x80)
return
}
type ReplicateDecision struct {
targetsMap map[string]int64
}
func Require(old []byte, extra int) []byte {
l := len(old)
c := cap(old)
r := l + extra
if c >= r {
return old
} else if l == 0 {
return make([]byte, 0, extra)
}
c <<= 1
if c < r {
c = r
}
n := make([]byte, l, c)
copy(n, old)
return n
}
func (z ReplicateDecision) Msgsize() (s int) {
s = 1
return
}
The default bent benchmarks are here: https://cs.opensource.google/go/x/benchmarks/+/3255f073:cmd/bent/configs/benchmarks-50.toml
It needs a decoder ring to tell you what benchmark+package that is, but it's in minio. (Decoder ring, scroll down: https://perf.golang.org/search?q=upload:20221026.20 )
A nanosecond's not great, but this was a regression, right? I try to have a diversity of benchmarks, but also, I am willing to ignore a few regressions if the big picture is okay.
Simple repro:
package p
import (
"testing"
)
func BenchmarkFoo(b *testing.B) {
s := make([]byte, 1)
for i := 0; i < b.N; i++ {
s = append(s[:0], 0)
}
}
The generated code is a bit worse after my CL because the compiler doesn't realize that growslice
definitely makes the returned slice have length 1.
Previously, the new length was computed as 0+1, which makes the length known after the append. But my change reloads the length from the result of growslice
so we don't have to spill/restore the length. But if the length is constant, spill/restore isn't really a worry. This is probably fixable with a special case. More invasive may be teaching the prove pass about the equality of length between the input and output of growslice
.
Change https://go.dev/cl/445875 mentions this issue: cmd/compile: recognize when the result of append has a constant length
FWIW, this did not seem to fix the benchmark: https://perf.golang.org/dashboard/?benchmark=AppendMsgReplicateDecision-8&days=90&end=2022-10-28T15%3A45
The most recent run is for https://go.dev/cl/444715, which was submitted after https://go.dev/cl/445875.
You're right, it fixes my simple repro but not the original repro.
Change https://go.dev/cl/446277 mentions this issue: cmd/compile: add rule for post-decomposed growslice optimization
Now it is better!
https://perf.golang.org/dashboard/?benchmark=AppendMsgReplicateDecision-8&days=200&end=2022-12-21T15%3A45