gloo
gloo copied to clipboard
Increased timeouts for proxies check in glooctl
Description
- Added more attempts to make
ProxyEndpointRequest
inglooctl check
when checking proxies. - Increased timeout on getting metrics from proxies in
glooctl check
when checking proxies.
Context
In our environment we run glooctl check
every 5 minutes and pretty often we get errors with connection problems like:
* rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp [::1]:51350: connect: connection refused"
...
* timed out trying to connect to localhost during port-forward, errors: 8 errors occurred:
* Get "http://localhost:51666/stats/prometheus": dial tcp [::1]:51666: connect: connection refused
...
About first error. Debugging glooctl check
locally I found out that sometimes port-forward starts to really work after more than 1 second. But request to gloo is made right after starting port-forward. It has 5 retries, and sometimes it is not enough for our environment. And as glooctl check
makes this request for all watched namespaces (we have many) and port-forward is created on every request, chance of problems increases.
About second error. We get it not so often, maybe once a day, but it still annoys. We have around 240_000 metrics on our proxies and probably 30 seconds timeout is not always enough.
Testing steps
I manually tested glooctl check
in my environmnent. Everything works fine.
Checklist:
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [x] I have added tests that prove my fix is effective or that my feature works