gloo icon indicating copy to clipboard operation
gloo copied to clipboard

`glooctl check` Unable to cancel request for *exec.roundTripper

Open kevin-shelaga opened this issue 3 years ago • 16 comments

Describe the bug Glooctl check returns intermittent error Unable to cancel request for *exec.roundTripper

To Reproduce Steps to reproduce the behavior:

  1. kube apply '...'
  2. curl '....'
  3. See error

Expected behavior Intermittent issue, hard to track down. Possibly related to timeouts: https://github.com/solo-io/gloo/issues/4204

Additional context Add any other context about the problem here, e.g.

  • Gloo Edge version - v1.7.3
  • Kubernetes version

kevin-shelaga avatar May 17 '21 13:05 kevin-shelaga

Another user is seeing this behavior in 1.7.0 in response to glooctl check. Internal Reference: https://solo-io.slack.com/archives/G01MLF5GZPZ/p1621348170006900

jameshbarton avatar May 18 '21 14:05 jameshbarton

Same problem here, with a new install in EKS, in an almost empty cluster.

Client: {"version":"1.7.11"}
Server: {"type":"Ingress","kubernetes":{"containers":[{"Tag":"1.7.11","Name":"discovery","Registry":"quay.io/solo-io"},{"Tag":"1.7.11","Name":"gateway","Registry":"quay.io/solo-io"},{"Tag":"1.7.11","Name":"gloo-envoy-wrapper","Registry":"quay.io/solo-io"},{"Tag":"1.7.11","Name":"gloo-envoy-wrapper","Registry":"quay.io/solo-io"},{"Tag":"1.7.11","Name":"gloo","Registry":"quay.io/solo-io"},{"Tag":"1.7.11","Name":"ingress","Registry":"quay.io/solo-io"},{"Tag":"1.7.11","Name":"gloo-envoy-wrapper","Registry":"quay.io/solo-io"},{"Tag":"1.7.11","Name":"gloo-envoy-wrapper","Registry":"quay.io/solo-io"}],"namespace":"gloo-system"}}

Kubernetes version is 1.18. At the end glooctl check yields No problems detected Before I was using gloo 1.5, and this problem never arises.

naimetti avatar Jun 26 '21 02:06 naimetti

Also having this issue with EKS, Kubernetes Version: 1.20.4

Client: {"version":"1.8.0"}
Server: {"type":"Gateway","kubernetes":{"containers":[{"Tag":"1.8.0","Name":"discovery","Registry":"quay.io/solo-io"},{"Tag":"1.8.0","Name":"gateway","Registry":"quay.io/solo-io"},{"Tag":"1.8.0","Name":"gloo-envoy-wrapper","Registry":"quay.io/solo-io"},{"Tag":"1.8.0","Name":"gloo","Registry":"quay.io/solo-io"}],"namespace":"gloo-system"}}

jeremyruffell avatar Jul 12 '21 23:07 jeremyruffell

Yet another user is confused by this spam. Internal Reference: https://solo-io.slack.com/archives/G01DET26YP7/p1628091286006600?thread_ts=1628083921.004200&cid=G01DET26YP7

jameshbarton avatar Aug 04 '21 15:08 jameshbarton

happening on AWS EKS 1.17 as well with gloo 1.7.18

sleerssen avatar Aug 17 '21 15:08 sleerssen

Initial investigation looks like there is excess go routines causing the errors, but why this happens more on a clean install with EKS and not AKS/GKE, i do not know.

....
Checking virtual services... OK
too many goroutines, only loaded 1024
W0818 07:33:32.221952   11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.222104   11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.222676   11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.222765   11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.223593   11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.223933   11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.224155   11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.224382   11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.224408   11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.224432   11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.224453   11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.224471   11539 transport.go:260] Unable to cancel request for *exec.roundTripper
W0818 07:33:32.225099   11539 transport.go:260] Unable to cancel request for *exec.roundTripper
too many goroutines, only loaded 1024


kevin-shelaga avatar Aug 18 '21 11:08 kevin-shelaga

Currently I am using this workaround: glooctl check 2>&1|grep -v roundTripper

jmunozro avatar Sep 07 '21 16:09 jmunozro

Happening on AWS EKS 1.21 with glooctl client 1.8.18 and gloo 1.8.11

dseravalli avatar Oct 05 '21 14:10 dseravalli

Yet another user confused by this issue: https://solo-io.slack.com/archives/C03146E7TEZ/p1643841851366019

jameshbarton avatar Feb 03 '22 14:02 jameshbarton

Same here:

$ glooctl version

Client: {"version":"1.10.5"}
Server: {"type":"Gateway","enterprise":true,"kubernetes":{"containers":[{"Tag":"1.10.1","Name":"gateway","Registry":"quay.io/solo-io"},{"Tag":"1.10.3","Name":"gloo-ee-envoy-wrapper","Registry":"quay.io/solo-io"},{"Tag":"1.10.3","Name":"extauth-ee","Registry":"quay.io/solo-io"},{"Tag":"1.10.3","Name":"gloo-ee","Registry":"quay.io/solo-io"},{"Tag":"1.10.3","Name":"rate-limit-ee","Registry":"quay.io/solo-io"},{"Tag":"6.2.4","Name":"redis","Registry":"docker.io"}],"namespace":"gloo-system"}}

EKS 1.20

pmichna avatar Feb 16 '22 17:02 pmichna

https://github.com/solo-io/gloo/pull/5937 seems to fix (or at least mitigate) this issue, though it's more of a band-aid than a proper fix. A good amount of time was put into investigating, but not a ton of progress has been made. There's some complex caching logic I don't fully understand, and it's likely that the root cause is hidden somewhere within. I've discussed with @sam-heilbron, and we agreed to go forward with this "temporary fix", evaluate the efficacy of it, then re-prioritize putting time into a more permanent fix.

Internal slack thread on the investigation: https://solo-io-corp.slack.com/archives/G01EERAK3KJ/p1645044029625889

MitchAman avatar Feb 28 '22 15:02 MitchAman

Zendesk ticket #599 has been linked to this issue.

soloio-bot avatar Feb 28 '22 21:02 soloio-bot

dumping the following here in case it's useful for debugging:

glooctl version

Client: {"version":"1.10.12"}
Server: {"type":"Gateway","enterprise":true,"kubernetes":{"containers":[{"Tag":"1.10.9","Name":"extauth-ee","Registry":"quay.io/solo-io"},{"Tag":"1.10.12","Name":"gateway","Registry":"quay.io/solo-io"},{"Tag":"1.10.9","Name":"gloo-ee-envoy-wrapper","Registry":"quay.io/solo-io"},{"Tag":"1.10.9","Name":"gloo-ee","Registry":"quay.io/solo-io"},{"Tag":"1.10.9","Name":"rate-limit-ee","Registry":"quay.io/solo-io"},{"Tag":"6.2.4","Name":"redis","Registry":"docker.io"}],"namespace":"gloo-system"}}

glooctl check

...
Checking secrets... W0425 10:03:46.570858   68612 transport.go:288] Unable to cancel request for *exec.roundTripper
E0425 10:03:46.570916   68612 request.go:1085] Unexpected error when reading response body: context deadline exceeded (Client.Timeout or context cancellation while reading body)
E0425 10:03:46.571034   68612 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: unexpected error when reading response body. Please retry. Original error: context deadline exceeded (Client.Timeout or context cancellation while reading body)
W0425 10:03:46.571058   68612 transport.go:288] Unable to cancel request for *exec.roundTripper
E0425 10:03:46.571172   68612 request.go:1085] Unexpected error when reading response body: net/http: request canceled (Client.Timeout or context cancellation while reading body)
E0425 10:03:46.571263   68612 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected error when reading response body. Please retry. Original error: net/http: request canceled (Client.Timeout or context cancellation while reading body)
W0425 10:03:46.571818   68612 transport.go:288] Unable to cancel request for *exec.roundTripper
E0425 10:03:46.571829   68612 request.go:1085] Unexpected error when reading response body: context deadline exceeded (Client.Timeout or context cancellation while reading body)
E0425 10:03:46.571921   68612 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected error when reading response body. Please retry. Original error: context deadline exceeded (Client.Timeout or context cancellation while reading body)
W0425 10:03:46.878432   68612 transport.go:288] Unable to cancel request for *exec.roundTripper
...

a wall of roundTripper logs (unable to cancel), interspersed with:

E0425 10:04:24.423738   68612 request.go:1085] Unexpected error when reading response body: context deadline exceeded (Client.Timeout or context cancellation while reading body)
E0425 10:04:24.423839   68612 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: unexpected error when reading response body. Please retry. Original error: context deadline exceeded (Client.Timeout or context cancellation while reading body)

aaronArinder avatar Apr 25 '22 14:04 aaronArinder

bumping your ulimit usually fixes this

kevin-shelaga avatar Nov 24 '22 15:11 kevin-shelaga

We are seeing the same issue when using the windows-amd64 binary.

There is no apparent ulimit alternative on Windows.

Version: Microsoft Windows [Version 10.0.19045.3208]

alex-hempel avatar Jul 19 '23 00:07 alex-hempel

This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.

github-actions[bot] avatar Jan 15 '24 10:01 github-actions[bot]

This issue has been closed due to no activity in the last 12 months.

github-actions[bot] avatar Jul 16 '24 10:07 github-actions[bot]