gloo
gloo copied to clipboard
`glooctl check` Unable to cancel request for *exec.roundTripper
Describe the bug
Glooctl check returns intermittent error Unable to cancel request for *exec.roundTripper
To Reproduce Steps to reproduce the behavior:
- kube apply '...'
- curl '....'
- See error
Expected behavior Intermittent issue, hard to track down. Possibly related to timeouts: https://github.com/solo-io/gloo/issues/4204
Additional context Add any other context about the problem here, e.g.
- Gloo Edge version - v1.7.3
- Kubernetes version
Another user is seeing this behavior in 1.7.0 in response to glooctl check
.
Internal Reference: https://solo-io.slack.com/archives/G01MLF5GZPZ/p1621348170006900
Same problem here, with a new install in EKS, in an almost empty cluster.
Client: {"version":"1.7.11"}
Server: {"type":"Ingress","kubernetes":{"containers":[{"Tag":"1.7.11","Name":"discovery","Registry":"quay.io/solo-io"},{"Tag":"1.7.11","Name":"gateway","Registry":"quay.io/solo-io"},{"Tag":"1.7.11","Name":"gloo-envoy-wrapper","Registry":"quay.io/solo-io"},{"Tag":"1.7.11","Name":"gloo-envoy-wrapper","Registry":"quay.io/solo-io"},{"Tag":"1.7.11","Name":"gloo","Registry":"quay.io/solo-io"},{"Tag":"1.7.11","Name":"ingress","Registry":"quay.io/solo-io"},{"Tag":"1.7.11","Name":"gloo-envoy-wrapper","Registry":"quay.io/solo-io"},{"Tag":"1.7.11","Name":"gloo-envoy-wrapper","Registry":"quay.io/solo-io"}],"namespace":"gloo-system"}}
Kubernetes version is 1.18.
At the end glooctl check
yields No problems detected
Before I was using gloo 1.5, and this problem never arises.
Also having this issue with EKS,
Kubernetes Version: 1.20.4
Client: {"version":"1.8.0"}
Server: {"type":"Gateway","kubernetes":{"containers":[{"Tag":"1.8.0","Name":"discovery","Registry":"quay.io/solo-io"},{"Tag":"1.8.0","Name":"gateway","Registry":"quay.io/solo-io"},{"Tag":"1.8.0","Name":"gloo-envoy-wrapper","Registry":"quay.io/solo-io"},{"Tag":"1.8.0","Name":"gloo","Registry":"quay.io/solo-io"}],"namespace":"gloo-system"}}
Yet another user is confused by this spam. Internal Reference: https://solo-io.slack.com/archives/G01DET26YP7/p1628091286006600?thread_ts=1628083921.004200&cid=G01DET26YP7
happening on AWS EKS 1.17 as well with gloo 1.7.18
Initial investigation looks like there is excess go routines causing the errors, but why this happens more on a clean install with EKS and not AKS/GKE, i do not know.
....
Checking virtual services... OK
too many goroutines, only loaded 1024
W0818 07:33:32.221952 11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.222104 11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.222676 11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.222765 11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.223593 11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.223933 11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.224155 11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.224382 11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.224408 11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.224432 11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.224453 11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.224471 11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.225099 11539 transport.go:260] Unable to cancel request for *exec.roundTripper
too many goroutines, only loaded 1024
Currently I am using this workaround: glooctl check 2>&1|grep -v roundTripper
Happening on AWS EKS 1.21 with glooctl client 1.8.18 and gloo 1.8.11
Yet another user confused by this issue: https://solo-io.slack.com/archives/C03146E7TEZ/p1643841851366019
Same here:
$ glooctl version
Client: {"version":"1.10.5"}
Server: {"type":"Gateway","enterprise":true,"kubernetes":{"containers":[{"Tag":"1.10.1","Name":"gateway","Registry":"quay.io/solo-io"},{"Tag":"1.10.3","Name":"gloo-ee-envoy-wrapper","Registry":"quay.io/solo-io"},{"Tag":"1.10.3","Name":"extauth-ee","Registry":"quay.io/solo-io"},{"Tag":"1.10.3","Name":"gloo-ee","Registry":"quay.io/solo-io"},{"Tag":"1.10.3","Name":"rate-limit-ee","Registry":"quay.io/solo-io"},{"Tag":"6.2.4","Name":"redis","Registry":"docker.io"}],"namespace":"gloo-system"}}
EKS 1.20
https://github.com/solo-io/gloo/pull/5937 seems to fix (or at least mitigate) this issue, though it's more of a band-aid than a proper fix. A good amount of time was put into investigating, but not a ton of progress has been made. There's some complex caching logic I don't fully understand, and it's likely that the root cause is hidden somewhere within. I've discussed with @sam-heilbron, and we agreed to go forward with this "temporary fix", evaluate the efficacy of it, then re-prioritize putting time into a more permanent fix.
Internal slack thread on the investigation: https://solo-io-corp.slack.com/archives/G01EERAK3KJ/p1645044029625889
Zendesk ticket #599 has been linked to this issue.
dumping the following here in case it's useful for debugging:
glooctl version
Client: {"version":"1.10.12"}
Server: {"type":"Gateway","enterprise":true,"kubernetes":{"containers":[{"Tag":"1.10.9","Name":"extauth-ee","Registry":"quay.io/solo-io"},{"Tag":"1.10.12","Name":"gateway","Registry":"quay.io/solo-io"},{"Tag":"1.10.9","Name":"gloo-ee-envoy-wrapper","Registry":"quay.io/solo-io"},{"Tag":"1.10.9","Name":"gloo-ee","Registry":"quay.io/solo-io"},{"Tag":"1.10.9","Name":"rate-limit-ee","Registry":"quay.io/solo-io"},{"Tag":"6.2.4","Name":"redis","Registry":"docker.io"}],"namespace":"gloo-system"}}
glooctl check
...
Checking secrets... W0425 10:03:46.570858 68612 transport.go:288] Unable to cancel request for *exec.roundTripper
E0425 10:03:46.570916 68612 request.go:1085] Unexpected error when reading response body: context deadline exceeded (Client.Timeout or context cancellation while reading body)
E0425 10:03:46.571034 68612 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: unexpected error when reading response body. Please retry. Original error: context deadline exceeded (Client.Timeout or context cancellation while reading body)
W0425 10:03:46.571058 68612 transport.go:288] Unable to cancel request for *exec.roundTripper
E0425 10:03:46.571172 68612 request.go:1085] Unexpected error when reading response body: net/http: request canceled (Client.Timeout or context cancellation while reading body)
E0425 10:03:46.571263 68612 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected error when reading response body. Please retry. Original error: net/http: request canceled (Client.Timeout or context cancellation while reading body)
W0425 10:03:46.571818 68612 transport.go:288] Unable to cancel request for *exec.roundTripper
E0425 10:03:46.571829 68612 request.go:1085] Unexpected error when reading response body: context deadline exceeded (Client.Timeout or context cancellation while reading body)
E0425 10:03:46.571921 68612 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected error when reading response body. Please retry. Original error: context deadline exceeded (Client.Timeout or context cancellation while reading body)
W0425 10:03:46.878432 68612 transport.go:288] Unable to cancel request for *exec.roundTripper
...
a wall of roundTripper
logs (unable to cancel), interspersed with:
E0425 10:04:24.423738 68612 request.go:1085] Unexpected error when reading response body: context deadline exceeded (Client.Timeout or context cancellation while reading body)
E0425 10:04:24.423839 68612 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: unexpected error when reading response body. Please retry. Original error: context deadline exceeded (Client.Timeout or context cancellation while reading body)
bumping your ulimit usually fixes this
We are seeing the same issue when using the windows-amd64 binary.
There is no apparent ulimit
alternative on Windows.
Version: Microsoft Windows [Version 10.0.19045.3208]
This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.
This issue has been closed due to no activity in the last 12 months.