go
go copied to clipboard
io: ReadAll buffer resizing is inefficient
What version of Go are you using (go version
)?
$ go version go version go1.17.5 linux/amd64
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/home/vagrant/.cache/go-build" GOENV="/home/vagrant/.config/go/env" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GOMODCACHE="/home/vagrant/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/home/vagrant" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/local/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64" GOVCS="" GOVERSION="go1.17.5" GCCGO="gccgo" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/home/vagrant/src/github.com/cortexproject/cortex/go.mod" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2696308846=/tmp/go-build -gno-record-gcc-switches"
What did you do?
Called ioutil.ReadAll
to read everything from an http response.
What did you expect to see?
Reasonable memory and CPU usage.
What did you see instead?
~60% more memory is allocated when using ioutil.ReadAll
compared with bytes.Buffer.ReadFrom
.
See https://github.com/prometheus/client_golang/pull/976. #46564 is making the same complaint, although it was solved in a different way.
The underlying reason is that ReadAll()
uses append
to resize its buffer, which goes up by a factor of 1.25 (slightly more in latest Go, but asymptotically the same) each time.
One might measure these alternatives on various axes:
- number of bytes allocated and freed
- max size of heap required to read a given size
- amount of copying while resizing
- percentage waste in returned buffer
The current implementation is better on the last one, and much worse on all the others.
Issue is still mistitled. Should be "io: ReadAll buffer resizing is inefficient."
Is the underlying question here if append should resize x2? It feels as if „memory-growing slice-like structures“ should potentially behave similarly throughout Go?
bytes.Buffer.grow(int)
uses a different algorithm, where it grows by 2*currentCap + requested n. It looks like this ends up being more efficient in certain cases.
Consistency is a fine thing, but I would expect blogs, books, etc., to advise “never call ReadAll
“ if it remains as is.
Stretching my imagination, some cases can avoid memory-growing: for instance when reading the body of an http.Request
with ContentLength
, the implementation of ReadAll
could use a special interface to obtain the expected size. Or, more simply, there could be a ReadAllWithExpectedSize()
helper.
We could maybe make ReadAll
forward to Copy
with something like this?
func ReadAll(r Reader) ([]byte, error) {
sw := new(sliceWriter)
_, err := Copy(sw, r)
return *sw, err
}
type sliceWriter []byte
func (sw *sliceWriter) Write(b []byte) (int, error) {
*sw = append(*sw, b...)
return len(b), nil
}
This would also add support for the WriterTo
logic implemented in Copy
(so something like a source bytes.Buffer
or bufio.Reader
should be able to just write directly to the destination buffer).
Alternatively, if forwarding to Copy
is undesirable because of the larger temporary buffer[^1], we could just add WriterTo
forwarding in the current logic:
func ReadAll(r Reader) ([]byte, error) {
if wt, ok := r.(WriterTo); ok {
sw := new(sliceWriter)
_, err := wt.WriteTo(sw)
return *sw, err
}
// ...existing ReadAll logic...
[^1]: ReadAll does not really make any guarantee in this sense, so maybe this is not really a valid concern.
@CAFxX can you clarify how this helps with the resizing of the destination buffer?
@bboreham sure, it boils down to io.Copy being able to optimize the copy by forwarding to io.ReaderFrom on the destination io.Writer, or to io.WriterTo on the source io.Reader.
In this specific case, the source io.Reader implements io.WriterTo, so the copy is optimized - and this completely sidesteps the resizing issue in the "reading the whole response body" case. Incidentally, this is almost the same thing that you did manually by calling bytes.Buffer.ReadFrom.
name time/op
Original-16 236µs ± 1%
Patched-16 172µs ± 2%
name alloc/op
Original-16 291kB ± 0%
Patched-16 153kB ± 0%
name allocs/op
Original-16 78.3 ± 1%
Patched-16 65.0 ± 0%
Benchmark source code
package issue50774
import (
"bytes"
"io"
"net/http"
"net/http/httptest"
"testing"
)
func benchmark(b *testing.B, readall func(io.Reader) ([]byte, error)) {
body := bytes.Repeat([]byte("x"), 1<<16)
s := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Write(body)
}))
defer s.Close()
var buf []byte
for i := 0; i < b.N; i++ {
res, err := http.Get(s.URL)
if err != nil {
b.Fatal(err)
}
buf, err = readall(res.Body)
if err != nil {
b.Fatal(err)
}
res.Body.Close()
}
_ = buf
}
func BenchmarkOriginal(b *testing.B) {
benchmark(b, io.ReadAll)
}
func BenchmarkPatched(b *testing.B) {
benchmark(b, ReadAll)
}
func ReadAll(r io.Reader) ([]byte, error) {
sw := new(sliceWriter)
_, err := io.Copy(sw, r)
return *sw, err
}
type sliceWriter []byte
func (sw *sliceWriter) Write(b []byte) (int, error) {
*sw = append(*sw, b...)
return len(b), nil
}
update: just for confirmation, I also tried with bytes.Buffer.ReadFrom, and it turns out my approach is even faster than that:
name time/op
Original-16 248µs ± 4%
Patched-16 173µs ± 2%
BufferReadFrom-16 215µs ± 3%
name alloc/op
Original-16 291kB ± 0%
Patched-16 153kB ± 0%
BufferReadFrom-16 266kB ± 0%
name allocs/op
Original-16 79.0 ± 0%
Patched-16 65.0 ± 0%
BufferReadFrom-16 70.0 ± 0%
func BenchmarkBufferReadFrom(b *testing.B) {
benchmark(b, func(r io.Reader) ([]byte, error) {
b := &bytes.Buffer{}
_, err := b.ReadFrom(r)
if err != nil {
return nil, err
}
return b.Bytes(), nil
})
}
I think your approach is taking advantage of being able to read the entire buffer in one go. This is not possible when reading from a socket, as in the problem that spawned this issue.
I pasted your ReadAll
into that code here
Before:
% go test -run xxx -bench . -benchmem ./api
goos: darwin
goarch: arm64
pkg: github.com/prometheus/client_golang/api
BenchmarkClient/4KB-8 28846 35189 ns/op 20543 B/op 73 allocs/op
BenchmarkClient/50KB-8 17594 68666 ns/op 135782 B/op 99 allocs/op
BenchmarkClient/1000KB-8 1515 960378 ns/op 2111125 B/op 593 allocs/op
BenchmarkClient/2000KB-8 894 1343809 ns/op 4212548 B/op 1095 allocs/op
PASS
ok github.com/prometheus/client_golang/api 6.333s
After:
% go test -run xxx -bench . -benchmem ./api
goos: darwin
goarch: arm64
pkg: github.com/prometheus/client_golang/api
BenchmarkClient/4KB-8 30478 36265 ns/op 43158 B/op 72 allocs/op
BenchmarkClient/50KB-8 16678 67617 ns/op 129620 B/op 96 allocs/op
BenchmarkClient/1000KB-8 1338 871225 ns/op 5466073 B/op 604 allocs/op
BenchmarkClient/2000KB-8 724 1783090 ns/op 11197990 B/op 1113 allocs/op
PASS
ok github.com/prometheus/client_golang/api 6.187s
@bboreham My benchmark is using httptest.NewServer to simulate reading the http response body from a real socket. You can also confirm this by inspecting the benchmark CPU profile and noticing how it's spending most of its time in read syscalls deep into the net package.
On the other hand, your benchmark does not seem to make sense as you are instantiating a bytes.Buffer, but not using it. And you're throwing the actual results of ReadAll away...
Sorry, going a bit too fast. I corrected the Prometheus client-library version to return the body, and the benchmark results are the same.
Maybe it's just that your test response is quite small at 64K (if I read that right)?
Yup. And you're right, with singnficantly larger response sizes (1MB or more) the larger buffer growth of bytes.Buffer will make bytes.Buffer.ReadFrom faster. I will try in the next days to see if there's something that can be done.