go icon indicating copy to clipboard operation
go copied to clipboard

cmd/compile/internal/pgo: Detect sample value position instead of hard-coding

Open brancz opened this issue 2 years ago • 5 comments

What version of Go are you using (go version)?

$ go version
go version go1.20 darwin/arm64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN="/Users/brancz/bin"
GOCACHE="/Users/brancz/Library/Caches/go-build"
GOENV="/Users/brancz/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/brancz/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/brancz"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="go1.20"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/brancz/src/github.com/parca-dev/parca/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/rl/8sh8jb5s1c1fl77ztzy4zw6m0000gn/T/go-build3824178585=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I tried out the 1.20 PGO feature using a merged pprof profile downloaded from a Parca server.

What did you expect to see?

Compile successfully, just like when supplying a runtime-generated pprof file.

What did you see instead?

The compiler errored several times with repeatedly printing errors like this:

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# internal/cpu
/usr/local/go/src/internal/cpu/cpu.go:12:5: internal compiler error: panic: runtime error: index out of range [1] with length 1

It turns out the sample position for CPU samples within a profile is hard coded to be index 1 (understandable since that's exactly what a profile looks like when generated by the runtime as it creates CPU nanoseconds on index 0 and CPU samples on index 1: https://github.com/golang/go/blob/de4748c47c67392a57f250714509f590f68ad395/src/cmd/compile/internal/pgo/irgraph.go#L145

Minimal reproducible example:

package main

import (
	"log"
	"os"
	"runtime/pprof"

	"github.com/google/pprof/profile"
)

func main() {
	const filename = "pgotest.prof"

	f, err := os.Create(filename)
	if err != nil {
		log.Fatal("could not create CPU profile: ", err)
	}
	if err := pprof.StartCPUProfile(f); err != nil {
		log.Fatal("could not start CPU profile: ", err)
	}

	hot()

	pprof.StopCPUProfile()

	f.Seek(0, 0)
	p, err := profile.Parse(f)
	if err != nil {
		log.Fatal("could not parse profile: ", err)
	}

	if err := f.Close(); err != nil {
		log.Fatal("could not close profile: ", err)
	}

	p.SampleType = []*profile.ValueType{p.SampleType[1]}
	for _, s := range p.Sample {
		s.Value = s.Value[1:]
	}

	if err := os.Remove(filename); err != nil {
		log.Fatal("could not remove profile: ", err)
	}

	f, err = os.Create(filename)
	if err != nil {
		log.Fatal("could not create CPU profile: ", err)
	}
	defer f.Close()

	if err := p.Write(f); err != nil {
		log.Fatal("could not write profile: ", err)
	}
}

func hot() {
	for i := 0; i < 10_000_000_000; i++ {
	}
}

Compile the program and run it:

go build pgotest.go
./pgotest

Then attempt to compile with PGO:

go build -pgo pgotest.prof pgotest.go
# internal/race
/usr/local/go/src/internal/race/norace.go:43:6: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# internal/goarch
/usr/local/go/src/internal/goarch/zgoarch_arm64.go:32:7: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# internal/goos
/usr/local/go/src/internal/goos/zgoos_darwin.go:25:7: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# internal/coverage/rtcov
/usr/local/go/src/internal/coverage/rtcov/rtcov.go:31:6: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# internal/unsafeheader
/usr/local/go/src/internal/unsafeheader/unsafeheader.go:34:6: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# unicode/utf8
/usr/local/go/src/unicode/utf8/utf8.go:65:5: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# internal/cpu
/usr/local/go/src/internal/cpu/cpu.go:12:5: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# internal/itoa
/usr/local/go/src/internal/itoa/itoa.go:18:6: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# math/bits
/usr/local/go/src/math/bits/bits.go:44:5: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# internal/goexperiment
/usr/local/go/src/internal/goexperiment/flags.go:57:6: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# sync/atomic
/usr/local/go/src/sync/atomic/type.go:40:5: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
# unicode
/usr/local/go/src/unicode/casetables.go:13:5: internal compiler error: panic: runtime error: index out of range [1] with length 1

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new

The fix is pretty straight forward, PGO should detect at which position in p.SampleType is .Type == "samples" && .Unit == "count".

Happy to submit a fix.

brancz avatar Feb 03 '23 14:02 brancz

Just to make extra sure, I double checked and indeed when inserting a placeholder on the 0 index it works: https://gist.github.com/brancz/9e56f8a302d6e196f57c4953156595b5

brancz avatar Feb 03 '23 14:02 brancz

Change https://go.dev/cl/465135 mentions this issue: cmd/compile/internal/pgo: fix hard-coded PGO sample data position

gopherbot avatar Feb 03 '23 15:02 gopherbot

cc @cherrymui @aclements

I think we may want to backport this to 1.20. It is a simple fix. Thoughts?

As an aside, I documented this incorrectly at https://go.dev/doc/pgo#alternative-sources, saying we use sample index 0, which is wrong. (For Go CPU profiles it doesn't matter because index 0 ("cpu") is just a scaled version of index 1 ("samples")).

prattmic avatar Feb 03 '23 16:02 prattmic

I'd love to see it as a bug fix! I did indeed go by that document. I think samples count is a good choice though, it's a simpler unit to export from profilers, as it tends to be the raw data (in our case captured from Linux perf_events).

brancz avatar Feb 03 '23 16:02 brancz

Thanks for the issue and the CL!

I think it is safe for backport. We usually don't backport performance stuff, but this is a new feature and it could cause build failure, maybe this is fine.

cherrymui avatar Feb 03 '23 17:02 cherrymui

@gopherbot please backport to 1.20. Profiles with only a single sample type can an ICE.

prattmic avatar Feb 03 '23 21:02 prattmic

Backport issue(s) opened: #58309 (for 1.20).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

gopherbot avatar Feb 03 '23 21:02 gopherbot

Change https://go.dev/cl/466515 mentions this issue: _content/doc/pgo: correct sample index requirements

gopherbot avatar Feb 08 '23 17:02 gopherbot

Change https://go.dev/cl/466438 mentions this issue: [release-branch.go1.20] cmd/compile/internal/pgo: fix hard-coded PGO sample data position

gopherbot avatar Feb 08 '23 18:02 gopherbot

Change https://go.dev/cl/466675 mentions this issue: cmd/compile/internal/pgo: fix hard-coded PGO sample data position

gopherbot avatar Feb 08 '23 20:02 gopherbot

Change https://go.dev/cl/467375 mentions this issue: [release-branch.go1.20] cmd/compile/internal/pgo: fix hard-coded PGO sample data position

gopherbot avatar Feb 10 '23 18:02 gopherbot