proxy.golang.org: Intermittent TLS/Network errors with Google's Module Proxy
Go version
go version go1.23.1 darwin/arm64
Output of go env in your module/workspace:
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/amir/Library/Caches/go-build'
GOENV='/Users/amir/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/amir/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/amir/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/private/var/tmp/_bazel_amir/3dbd0b78d662a8a6e641b2d6e1f7442e/external/rules_go~~go_sdk~go_sdk'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/private/var/tmp/_bazel_amir/3dbd0b78d662a8a6e641b2d6e1f7442e/external/rules_go~~go_sdk~go_sdk/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.1'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/amir/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/private/var/tmp/_bazel_amir/3dbd0b78d662a8a6e641b2d6e1f7442e/execroot/_main/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/l8/54vw83s15sn1h80rkn3xxl1c0000gn/T/go-build1795123535=/tmp/go-build -gno-record-gcc-switches -fno-common'
What did you do?
Environment:
- Bazel with latest rules_go.
- GitHub actions
Ran a bazel build ${some_target}
Note that I'm not sure if this problem is limited to bazel, or happens with normal go invocations as well.
What did you see happen?
Intermittently, some modules fail to download from Google's Go Module Proxy:
(22:42:44) ERROR: /home/runner/.bazel/external/gazelle~~go_deps~com_github_aws_aws_sdk_go_v2_service_s3/BUILD.bazel:5:11: @@gazelle~~go_deps~com_github_aws_aws_sdk_go_v2_service_s3//:s3 depends on @@gazelle~~go_deps~com_github_aws_aws_sdk_go_v2_service_internal_checksum//:checksum in repository @@gazelle~~go_deps~com_github_aws_aws_sdk_go_v2_service_internal_checksum which failed to fetch. no such package '@@gazelle~~go_deps~com_github_aws_aws_sdk_go_v2_service_internal_checksum//': gazelle~~go_deps~com_github_aws_aws_sdk_go_v2_service_internal_checksum: fetch_repo: github.com/aws/aws-sdk-go-v2/service/internal/[email protected]: Get "https://proxy.golang.org/github.com/aws/aws-sdk-go-v2/service/internal/checksum/@v/v1.4.1.info": net/http: TLS handshake timeout
What did you expect to see?
No errors.
Related Issues and Documentation
- cmd/go: get panics with "can't find reason for requirement on" #57037 (closed)
- proxy.golang.org: i/o timeout frequently get i/o timeout when using bazel/gazelle #63562 (closed)
- x/tools/gopls: prefer dependencies from the go.mod when populating imports #45209 (closed)
- retrieving external modules on Go1.15 on s390x appears to have checksum and ECDSA verification issues #40949 (closed)
- proxy.golang.org: dial tcp 142.251.33.113:443: i/o timeout #63244 (closed)
- net/http: frequent HTTP2 INTERNAL_ERROR errors during module zip download since 2021-10-06 #51323
- proxy.golang.org: incorrect ZIP for github.com/aws/[email protected] #45517 (closed)
- cmd/go: retry some fetch failures #29345 (closed)
- cmd/go: bazel wrapper disables build cache but not module mode #29850 (closed)
- x/tools/gopls: add a setting not to download modules #39264 (closed)
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Just had it happen again:
/home/runner/work/spirl/spirl/spirlctl/BUILD.bazel:7:11: //spirlctl:spirlctl_lib depends on @@gazelle~~go_deps~com_github_spf13_cobra//:cobra in repository @@gazelle~~go_deps~com_github_spf13_cobra which failed to fetch. no such package '@@gazelle~~go_deps~com_github_spf13_cobra//': gazelle~~go_deps~com_github_spf13_cobra: fetch_repo: github.com/spf13/[email protected]: Get "https://proxy.golang.org/github.com/spf13/cobra/@v/v1.8.1.info": dial tcp 142.250.176.17:443: i/o timeout
isn't this more likely to be a network issue in your local network?
isn't this more likely to be a network issue in your local network?
This is happening in GitHub actions primarily. I've also confirmed its happening to someone who is using self-hosted AWS runners.
This issue seems similar to https://github.com/golang/go/issues/63562
I've now added
common '--repo_env=GOPROXY=https://goproxy.io,https://proxy.golang.org,direct'
To my bazelrc, which effectively just changes GOPROXY. So far it seems we've not hit these errors. I'll keep an eye out on this and see if it fixes it. If it does, I suspect some load balancer at Google is struggling.
I have not seen this issue happen since we’ve changed GOPROXY.
I’ll still keep an eye out on it.
We experienced similar errors at Figma with rules_go until we set up a mirror to use as GOPROXY. https://github.com/golang/go/issues/63244
We're hitting this issue today, from AWS EC2 hosted CI runners (not from github actions).
I've switched us over to the following bazelrc incantation:
common '--repo_env=GOPROXY=https://proxy.golang.org|https://goproxy.io|direct'
CC @golang/tools-team.
We suspect this is a transient network issue. Are folks still experiencing this?
I've for now switched to using multiple proxies, so I dont think I'll be able to know for certain..
I just ran into this locally again when using only proxy.golang.org.
loading failure: com.google.devtools.build.lib.rules.repository.RepositoryFunction$AlreadyReportedRepositoryAccessException: gazelle~~go_deps~com_github_aws_aws_sdk_go_v2_service_internal_presigned_url: fetch_repo: github.com/aws/aws-sdk-go
-v2/service/internal/[email protected]: Get "https://proxy.golang.org/github.com/aws/aws-sdk-go-v2/service/internal/presigned-url/@v/v1.12.2.info": dial tcp 142.250.81.241:443: i/o timeout
Thanks @aaomidi How frequently do you see this failure? If you retry, does it succeed?
If I retry it does succeed. I don't see it that often, but that's partially because of heavy caching on my end. I no longer see this in CI after adding goproxy.io to the list of proxies it can retrieve data from.
I do think this is specifically an issue with how gazelle interacts with the go module proxy, on a subset of the Google LBs.
I frequently see this on CI runners hosted on aws ec2. At least multiple times per day this past week. It does succeed when retrying, but sometimes may fail again if we retry
cc: @samthanawalla
Since I don't think we'll be able to resolve network issues between aws and Google frontends, it seems like the most tractable solution to this problem would be to incorporate retries into the Go command (#28194). From skimming that issue, it is not clear whether we want to support this.
This has nothing to do with AWS for what it’s worth. I’ve had this issue on GitHub actions which would be azure.
I don’t think it’s the network either. This issue is really mainly prominent in Bazel tooling.
@aaomidi got it.
Absent more detailed traces indicating a problem with the go command or module proxy, this doesn't seem actionable from the perspective of the proxy.
I’m not sure that’s true though. Especially given Bazel and go are both Google projects. I think realistically this is only really actionable by Google by involving the Bazel team with this bug.
Would a repro CI run help with this?
I don’t think it’s the network either. This issue is really mainly prominent in Bazel tooling.
The errors reported in this issue all seem to be network timeouts. The fact that it mainly shows up in Bazel may just mean that Bazel is the main program contacting the Go proxy. After all, if you are using Bazel there isn't much reason for anything else to contact the Go proxy.
And that suggests that there are network problems between wherever people are running Bazel and the Go proxy.
I don't see how fixes to either Bazel or the Go proxy can affect that.
What might conceivably help is to collect the IP addresses that are failiing, both the IP address where Bazel is running and the IP address that it is using to contact the module proxy.
Or, yes, if there is a way to reproduce this reliably that would be helpful.
Timed out in state WaitingForInfo. Closing.
(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)