go icon indicating copy to clipboard operation
go copied to clipboard

io: ReadAll buffer resizing is inefficient

Open bboreham opened this issue 2 years ago • 11 comments

What version of Go are you using (go version)?

$ go version
go version go1.17.5 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/vagrant/.cache/go-build"
GOENV="/home/vagrant/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/vagrant/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/vagrant"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.5"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/vagrant/src/github.com/cortexproject/cortex/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2696308846=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Called ioutil.ReadAll to read everything from an http response.

What did you expect to see?

Reasonable memory and CPU usage.

What did you see instead?

~60% more memory is allocated when using ioutil.ReadAll compared with bytes.Buffer.ReadFrom.

See https://github.com/prometheus/client_golang/pull/976. #46564 is making the same complaint, although it was solved in a different way.

The underlying reason is that ReadAll() uses append to resize its buffer, which goes up by a factor of 1.25 (slightly more in latest Go, but asymptotically the same) each time.

One might measure these alternatives on various axes:

  • number of bytes allocated and freed
  • max size of heap required to read a given size
  • amount of copying while resizing
  • percentage waste in returned buffer

The current implementation is better on the last one, and much worse on all the others.

bboreham avatar Jan 24 '22 15:01 bboreham

Issue is still mistitled. Should be "io: ReadAll buffer resizing is inefficient."

earthboundkid avatar Jan 25 '22 15:01 earthboundkid

Is the underlying question here if append should resize x2? It feels as if „memory-growing slice-like structures“ should potentially behave similarly throughout Go?

andig avatar Feb 10 '22 18:02 andig

bytes.Buffer.grow(int) uses a different algorithm, where it grows by 2*currentCap + requested n. It looks like this ends up being more efficient in certain cases.

earthboundkid avatar Feb 10 '22 18:02 earthboundkid

Consistency is a fine thing, but I would expect blogs, books, etc., to advise “never call ReadAll“ if it remains as is.

Stretching my imagination, some cases can avoid memory-growing: for instance when reading the body of an http.Request with ContentLength, the implementation of ReadAll could use a special interface to obtain the expected size. Or, more simply, there could be a ReadAllWithExpectedSize() helper.

bboreham avatar Feb 10 '22 20:02 bboreham

We could maybe make ReadAll forward to Copy with something like this?

func ReadAll(r Reader) ([]byte, error) {
	sw := new(sliceWriter)
	_, err := Copy(sw, r)
	return *sw, err
}

type sliceWriter []byte

func (sw *sliceWriter) Write(b []byte) (int, error) {
	*sw = append(*sw, b...)
	return len(b), nil
}

This would also add support for the WriterTo logic implemented in Copy (so something like a source bytes.Buffer or bufio.Reader should be able to just write directly to the destination buffer).

Alternatively, if forwarding to Copy is undesirable because of the larger temporary buffer[^1], we could just add WriterTo forwarding in the current logic:

func ReadAll(r Reader) ([]byte, error) {
    if wt, ok := r.(WriterTo); ok {
        sw := new(sliceWriter)
        _, err := wt.WriteTo(sw)
        return *sw, err
    }
    // ...existing ReadAll logic...

[^1]: ReadAll does not really make any guarantee in this sense, so maybe this is not really a valid concern.

CAFxX avatar May 18 '22 06:05 CAFxX

@CAFxX can you clarify how this helps with the resizing of the destination buffer?

bboreham avatar Aug 05 '22 16:08 bboreham

@bboreham sure, it boils down to io.Copy being able to optimize the copy by forwarding to io.ReaderFrom on the destination io.Writer, or to io.WriterTo on the source io.Reader.

In this specific case, the source io.Reader implements io.WriterTo, so the copy is optimized - and this completely sidesteps the resizing issue in the "reading the whole response body" case. Incidentally, this is almost the same thing that you did manually by calling bytes.Buffer.ReadFrom.

name         time/op
Original-16  236µs ± 1%
Patched-16   172µs ± 2%

name         alloc/op
Original-16  291kB ± 0%
Patched-16   153kB ± 0%

name         allocs/op
Original-16   78.3 ± 1%
Patched-16    65.0 ± 0%
Benchmark source code
package issue50774

import (
	"bytes"
	"io"
	"net/http"
	"net/http/httptest"
	"testing"
)

func benchmark(b *testing.B, readall func(io.Reader) ([]byte, error)) {
	body := bytes.Repeat([]byte("x"), 1<<16)
	s := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		w.Write(body)
	}))
	defer s.Close()

	var buf []byte
	for i := 0; i < b.N; i++ {
		res, err := http.Get(s.URL)
		if err != nil {
			b.Fatal(err)
		}
		buf, err = readall(res.Body)
		if err != nil {
			b.Fatal(err)
		}
		res.Body.Close()
	}
	_ = buf
}

func BenchmarkOriginal(b *testing.B) {
	benchmark(b, io.ReadAll)
}

func BenchmarkPatched(b *testing.B) {
	benchmark(b, ReadAll)
}

func ReadAll(r io.Reader) ([]byte, error) {
	sw := new(sliceWriter)
	_, err := io.Copy(sw, r)
	return *sw, err
}

type sliceWriter []byte

func (sw *sliceWriter) Write(b []byte) (int, error) {
	*sw = append(*sw, b...)
	return len(b), nil
}

update: just for confirmation, I also tried with bytes.Buffer.ReadFrom, and it turns out my approach is even faster than that:

name               time/op
Original-16        248µs ± 4%
Patched-16         173µs ± 2%
BufferReadFrom-16  215µs ± 3%

name               alloc/op
Original-16        291kB ± 0%
Patched-16         153kB ± 0%
BufferReadFrom-16  266kB ± 0%

name               allocs/op
Original-16         79.0 ± 0%
Patched-16          65.0 ± 0%
BufferReadFrom-16   70.0 ± 0%
func BenchmarkBufferReadFrom(b *testing.B) {
	benchmark(b, func(r io.Reader) ([]byte, error) {
		b := &bytes.Buffer{}
		_, err := b.ReadFrom(r)
		if err != nil {
			return nil, err
		}
		return b.Bytes(), nil
	})
}

CAFxX avatar Aug 06 '22 05:08 CAFxX

I think your approach is taking advantage of being able to read the entire buffer in one go. This is not possible when reading from a socket, as in the problem that spawned this issue.

I pasted your ReadAll into that code here

Before:

% go test -run xxx -bench . -benchmem ./api            
goos: darwin
goarch: arm64
pkg: github.com/prometheus/client_golang/api
BenchmarkClient/4KB-8              28846             35189 ns/op           20543 B/op         73 allocs/op
BenchmarkClient/50KB-8             17594             68666 ns/op          135782 B/op         99 allocs/op
BenchmarkClient/1000KB-8            1515            960378 ns/op         2111125 B/op        593 allocs/op
BenchmarkClient/2000KB-8             894           1343809 ns/op         4212548 B/op       1095 allocs/op
PASS
ok      github.com/prometheus/client_golang/api 6.333s

After:

% go test -run xxx -bench . -benchmem ./api
goos: darwin
goarch: arm64
pkg: github.com/prometheus/client_golang/api
BenchmarkClient/4KB-8              30478             36265 ns/op           43158 B/op         72 allocs/op
BenchmarkClient/50KB-8             16678             67617 ns/op          129620 B/op         96 allocs/op
BenchmarkClient/1000KB-8            1338            871225 ns/op         5466073 B/op        604 allocs/op
BenchmarkClient/2000KB-8             724           1783090 ns/op        11197990 B/op       1113 allocs/op
PASS
ok      github.com/prometheus/client_golang/api 6.187s

bboreham avatar Aug 06 '22 06:08 bboreham

@bboreham My benchmark is using httptest.NewServer to simulate reading the http response body from a real socket. You can also confirm this by inspecting the benchmark CPU profile and noticing how it's spending most of its time in read syscalls deep into the net package.

image

On the other hand, your benchmark does not seem to make sense as you are instantiating a bytes.Buffer, but not using it. And you're throwing the actual results of ReadAll away...

CAFxX avatar Aug 06 '22 07:08 CAFxX

Sorry, going a bit too fast. I corrected the Prometheus client-library version to return the body, and the benchmark results are the same.

Maybe it's just that your test response is quite small at 64K (if I read that right)?

bboreham avatar Aug 06 '22 07:08 bboreham

Yup. And you're right, with singnficantly larger response sizes (1MB or more) the larger buffer growth of bytes.Buffer will make bytes.Buffer.ReadFrom faster. I will try in the next days to see if there's something that can be done.

CAFxX avatar Aug 06 '22 08:08 CAFxX