k9s icon indicating copy to clipboard operation
k9s copied to clipboard

no connection for cached dial! for eks cluster

Open exinos-git opened this issue 4 months ago • 14 comments




Describe the bug cannot connect to EKS cluster after credentials expire and are refreshed get "no connection for cached dial!"

To Reproduce Steps to reproduce the behavior:

  1. connect to an EKS cluster
  2. wait for credentials to expire
  3. login to AWS again to refresh creds
  4. try to connect to EKS cluster with k9s

Historical Documents 1932 9:00AM INF <U+2705> Kubernetes connectivity 1933 9:00AM ERR Fail to load global/context configuration error="the server has asked for the client to provide credentials\nk9s config file "/home/someuser/.config/k9s/config.yaml" load failed:\nAdditional pr 1933 operty fullScreen is not allowed\ncannot connect to context: arn:aws:eks:someregion::cluster/blahblah\nk8s connection failed for context: arn:aws:eks:somregeion::cluster/blahblah" 1934 9:00AM ERR Load cluster resources - No API server connection 1935 9:00AM ERR failed to list contexts error="no connection" 1936 9:00AM WRN Unable to dial discovery API error="no connection to dial" 1937 9:00AM ERR can't connect to cluster error="the server has asked for the client to provide credentials" 1938 9:00AM ERR Load cluster resources - No API server connection 1939 9:00AM WRN Unable to dial discovery API error="no connection to dial" 1940 9:00AM ERR Context switch failed error="no connection to cached dial" 1941 9:00AM ERR no connection to cached dial 1942 9:00AM ERR Context switch failed error="no connection to cached dial" 1943 9:00AM ERR no connection to cached dial 1944 9:00AM ERR Context switch failed error="no connection to cached dial" 1945 9:00AM ERR no connection to cached dial 1946 9:00AM ERR Context switch failed error="no connection to cached dial" 1947 9:00AM ERR no connection to cached dial 1948 9:00AM ERR Context switch failed error="no connection to cached dial" 1949 9:00AM ERR no connection to cached dial 1950 9:00AM ERR Context switch failed error="no connection to cached dial" 1951 9:00AM ERR no connection to cached dial 1952 9:00AM ERR Context switch failed error="no connection to cached dial" 1953 9:00AM ERR no connection to cached dial 1954 9:00AM ERR Context switch failed error="no connection to cached dial" 1955 9:00AM ERR no connection to cached dial 1956 9:00AM ERR Context switch failed error="no connection to cached dial" 1957 9:00AM ERR no connection to cached dial 1958 9:00AM ERR Context switch failed error="no connection to cached dial" 1959 9:00AM ERR no connection to cached dial 1960 9:00AM ERR Context switch failed error="no connection to cached dial" 1961 9:00AM ERR no connection to cached dial 1962 9:00AM ERR Context switch failed error="no connection to cached dial" 1963 9:00AM ERR no connection to cached dial 1964 9:00AM ERR Context switch failed error="no connection to cached dial" 1965 9:00AM ERR no connection to cached dial 1966 9:00AM ERR Context switch failed error="no connection to cached dial" 1967 9:00AM ERR no connection to cached dial 1968 9:00AM ERR Context switch failed error="no connection to cached dial" 1969 9:00AM ERR no connection to cached dial 1970 9:00AM ERR Context switch failed error="no connection to cached dial" 1971 9:00AM ERR no connection to cached dial 1972 9:00AM ERR Context switch failed error="no connection to cached dial" 1973 9:00AM ERR no connection to cached dial 1974 9:00AM ERR Context switch failed error="no connection to cached dial" 1975 9:00AM ERR no connection to cached dial

Expected behavior it refreshes the connection with new creds

Screenshots

Versions (please complete the following information):

  • OS: [e.g. WSL2]
  • K9s: [e.g. v0.32.4]
  • K8s: [e.g. 1.27.12]

Additional context the only way i could work around this was by moving mv /home/someuser/.local/share/k9s/clusters /home/someuser/.local/share/k9s/clustersbad

exinos-git avatar Apr 25 '24 13:04 exinos-git

A few weeks ago I also started to have "no connection for cached dial" errors all of the sudden. I've used k9s for more than a year and never had that problem before. In my case I'm connecting to GKE clusters.

If I try to reach the clusters using kubectl it works perfectly, but for some reason I need to do a lot of retries in k9 before it will let me access the clusters. I tried upgrading to the latest k9s version, but the issue persists.

pdfrod avatar Apr 30 '24 18:04 pdfrod

@pdfrod did you try the workaround i mention mv /home/$USER/.local/share/k9s/clusters /home/$USER/.local/share/k9s/clustersbad

exinos-git avatar Apr 30 '24 18:04 exinos-git

Just tried it, but it didn't make any difference for me unfortunately.

pdfrod avatar Apr 30 '24 18:04 pdfrod

Unfortunately I'm also running into this same issue.

After sourcing my new AWS temp credentials with MFA if i start k9s i have to wait several seconds for the context to be loaded properly and it starts working. However sometimes it doesn't load properly and I'm stuck with: "no connection to cached dial".

Version: v0.32.4 Commit: d3027c8f2916b23606f647f47b434b08fc34bdf8 Date: 2024-03-20T19:16:59Z

cablekevin avatar May 01 '24 07:05 cablekevin

Having the same issue, in some cases k9s appears to reload itself and somehow the issue resolves itself but I'm not quite sure how to trigger it. I tried switching between clusters or hitting ctrl + r.

I even tried to re-authenticate outside of k9s but the UI eventually seemed to refresh on its own after several seconds. It might be helpful to be able to trigger whatever refresh process seemingly happens in the background manually either when refreshing with ctrl + r or with another command.

olivierlacan avatar May 03 '24 22:05 olivierlacan

I ran into this problem today with clusters in both EKS and GKE, and here's how I solved it:

  1. rename the current k9s config clusters folder to clustersbad with @exinos-git 's mv command. or delete it. your choice a. N.B.: if you're on OSX, the default K9s config directory is at ~/Library/Application\ Support/k9s
  2. re-authenticate to your clusters out-of-band and update the kubeconfig a. for EKS aws eks update-kubeconfig --name {cluster name} --region {cluster region} b. for GKE gcloud container clusters get-credentials {cluster name} --region {cluster region}
  3. run K9s

After following these three steps, k9s automatically boots into the last context I connected to.

I believe what happened in my case was that I updated the names of my contexts in my ~/.kube/config file directly, instead of renaming them in k9s, and that screwed up the mappings between my kubeconfig contexts and the cluster configurations in k9s

eric-gt avatar May 07 '24 20:05 eric-gt

Most likely a duplicate of https://github.com/derailed/k9s/issues/2651

wolffberg avatar May 08 '24 10:05 wolffberg