go icon indicating copy to clipboard operation
go copied to clipboard

cmd/compile: optimize append with string concatenation

Open dsnet opened this issue 2 years ago • 2 comments

Consider the following benchmark:

const prefix = "some sufficiently long prefix "
var something = "something"
const suffix = " some sufficiently long suffix"

var sink []byte

func BenchmarkNaive(b *testing.B) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		sink = append(sink[:0], prefix+something+suffix...)
	}
}

There's a number of string concatenations going on, some from variables and others from constants, which runs in:

BenchmarkNaive-24        	23387558	        45.20 ns/op	      80 B/op	       1 allocs/op

However, the benchmark is semantically equivalent to:

func BenchmarkOptimized(b *testing.B) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		sink = append(append(append(sink[:0], prefix...), something...), suffix...)
	}
}

which runs in:

BenchmarkOptimized-24    	191343406	         6.218 ns/op	       0 B/op	       0 allocs/op

That is a 100% reduction in allocations, and a dramatic speedup.

The compiler should implicitly perform this transformation for me.

dsnet avatar Oct 26 '22 20:10 dsnet

@dsnet If the capacity of sink is large enough, then the optimized way is really more performant. Othewise, it might not.

You use a global sink, so the benchmark is not very fair.

[edit]: the two ways are only semantically equivalent if the capacity of sink is large enough.

zigo101 avatar Oct 27 '22 14:10 zigo101

The purpose of a global sink is to work around compiler optimizations that gets rid of the entire append. It still demonstrates my point that appending to a slice with sufficient capacity is dramatically faster as individual appends rather than a single one.

Also, whether multiple small appends is faster for empty slices is an implementation detail. The transformation can compute the total size of all the appended strings and call runtime.growslice before calling the series of appends. Thus, the first-time allocation for append is identical to BenchmarkNaive, but avoids the allocations that come from constructing the concatenated string.

dsnet avatar Oct 27 '22 16:10 dsnet