kubectl
kubectl copied to clipboard
kubectl does not retry after TLS handshake timeout
What happened:
One of our three control plane IPs is unresponsive. On my local machine, what I observe is sporadically it will lag for about 10 seconds, but otherwise works fine. This is because the Go standard library divides the 30 second dial timeout over the 3 IPs, and when the first times out it falls back to the second one.
Further testing shows that if the entire TCP dial times out, then kubectl itself will retry.
However, our build server is behind a firewall. Because of this, what happens there is the TCP dial works but the TLS handshake times out after 10 seconds. When this happens, kubectl treats it as fatal and does not attempt to retry.
What you expected to happen:
kubectl should retry if the TLS handshake times out. (It should start over with a fresh TCP dial.)
How to reproduce it (as minimally and precisely as possible):
I don't know how to force this issue to reproduce.
Anything else we need to know?:
Environment:
- Kubernetes client and server versions (use
kubectl version): v1.21.13 (client), v1.22.12 (server) - Cloud provider or hardware configuration: AWS EKS
- OS (e.g:
cat /etc/os-release): macOS 12.5.1
@rittneje: This issue is currently awaiting triage.
SIG CLI takes a lead on issue triage for this repo, but any Kubernetes member can accept issues by applying the triage/accepted label.
The triage/accepted label can be added by org members by writing /triage accepted in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I'd see this as more of a feature request to add retries. (I don't think we ever documented that kubectl would retry in this situation).
@sftim Based on the current behavior, it seems kubectl intends to retry if the control plane is not available. However, it did not consider what that situation looks like when it is behind a firewall, which is a fairly common use case.