go
go copied to clipboard
cmd/compile: optimize append with string concatenation
Consider the following benchmark:
const prefix = "some sufficiently long prefix "
var something = "something"
const suffix = " some sufficiently long suffix"
var sink []byte
func BenchmarkNaive(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
sink = append(sink[:0], prefix+something+suffix...)
}
}
There's a number of string concatenations going on, some from variables and others from constants, which runs in:
BenchmarkNaive-24 23387558 45.20 ns/op 80 B/op 1 allocs/op
However, the benchmark is semantically equivalent to:
func BenchmarkOptimized(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
sink = append(append(append(sink[:0], prefix...), something...), suffix...)
}
}
which runs in:
BenchmarkOptimized-24 191343406 6.218 ns/op 0 B/op 0 allocs/op
That is a 100% reduction in allocations, and a dramatic speedup.
The compiler should implicitly perform this transformation for me.
@dsnet
If the capacity of sink
is large enough, then the optimized way is really more performant.
Othewise, it might not.
You use a global sink
, so the benchmark is not very fair.
[edit]: the two ways are only semantically equivalent if the capacity of sink
is large enough.
The purpose of a global sink
is to work around compiler optimizations that gets rid of the entire append
. It still demonstrates my point that appending to a slice with sufficient capacity is dramatically faster as individual appends rather than a single one.
Also, whether multiple small appends is faster for empty slices is an implementation detail. The transformation can compute the total size of all the appended strings and call runtime.growslice
before calling the series of appends. Thus, the first-time allocation for append is identical to BenchmarkNaive
, but avoids the allocations that come from constructing the concatenated string.