k9s no connection for cached dial! for eks cluster

Describe the bug cannot connect to EKS cluster after credentials expire and are refreshed get "no connection for cached dial!"

To Reproduce Steps to reproduce the behavior:

connect to an EKS cluster
wait for credentials to expire
login to AWS again to refresh creds
try to connect to EKS cluster with k9s

Historical Documents 1932 9:00AM INF <U+2705> Kubernetes connectivity 1933 9:00AM ERR Fail to load global/context configuration error="the server has asked for the client to provide credentials\nk9s config file "/home/someuser/.config/k9s/config.yaml" load failed:\nAdditional pr 1933 operty fullScreen is not allowed\ncannot connect to context: arn:aws:eks:someregion::cluster/blahblah\nk8s connection failed for context: arn:aws:eks:somregeion::cluster/blahblah" 1934 9:00AM ERR Load cluster resources - No API server connection 1935 9:00AM ERR failed to list contexts error="no connection" 1936 9:00AM WRN Unable to dial discovery API error="no connection to dial" 1937 9:00AM ERR can't connect to cluster error="the server has asked for the client to provide credentials" 1938 9:00AM ERR Load cluster resources - No API server connection 1939 9:00AM WRN Unable to dial discovery API error="no connection to dial" 1940 9:00AM ERR Context switch failed error="no connection to cached dial" 1941 9:00AM ERR no connection to cached dial 1942 9:00AM ERR Context switch failed error="no connection to cached dial" 1943 9:00AM ERR no connection to cached dial 1944 9:00AM ERR Context switch failed error="no connection to cached dial" 1945 9:00AM ERR no connection to cached dial 1946 9:00AM ERR Context switch failed error="no connection to cached dial" 1947 9:00AM ERR no connection to cached dial 1948 9:00AM ERR Context switch failed error="no connection to cached dial" 1949 9:00AM ERR no connection to cached dial 1950 9:00AM ERR Context switch failed error="no connection to cached dial" 1951 9:00AM ERR no connection to cached dial 1952 9:00AM ERR Context switch failed error="no connection to cached dial" 1953 9:00AM ERR no connection to cached dial 1954 9:00AM ERR Context switch failed error="no connection to cached dial" 1955 9:00AM ERR no connection to cached dial 1956 9:00AM ERR Context switch failed error="no connection to cached dial" 1957 9:00AM ERR no connection to cached dial 1958 9:00AM ERR Context switch failed error="no connection to cached dial" 1959 9:00AM ERR no connection to cached dial 1960 9:00AM ERR Context switch failed error="no connection to cached dial" 1961 9:00AM ERR no connection to cached dial 1962 9:00AM ERR Context switch failed error="no connection to cached dial" 1963 9:00AM ERR no connection to cached dial 1964 9:00AM ERR Context switch failed error="no connection to cached dial" 1965 9:00AM ERR no connection to cached dial 1966 9:00AM ERR Context switch failed error="no connection to cached dial" 1967 9:00AM ERR no connection to cached dial 1968 9:00AM ERR Context switch failed error="no connection to cached dial" 1969 9:00AM ERR no connection to cached dial 1970 9:00AM ERR Context switch failed error="no connection to cached dial" 1971 9:00AM ERR no connection to cached dial 1972 9:00AM ERR Context switch failed error="no connection to cached dial" 1973 9:00AM ERR no connection to cached dial 1974 9:00AM ERR Context switch failed error="no connection to cached dial" 1975 9:00AM ERR no connection to cached dial

Expected behavior it refreshes the connection with new creds

Screenshots

Versions (please complete the following information):

OS: [e.g. WSL2]
K9s: [e.g. v0.32.4]
K8s: [e.g. 1.27.12]

Additional context the only way i could work around this was by moving mv /home/someuser/.local/share/k9s/clusters /home/someuser/.local/share/k9s/clustersbad

Apr 25 '24 13:04 exinos-git

A few weeks ago I also started to have "no connection for cached dial" errors all of the sudden. I've used k9s for more than a year and never had that problem before. In my case I'm connecting to GKE clusters.

If I try to reach the clusters using kubectl it works perfectly, but for some reason I need to do a lot of retries in k9 before it will let me access the clusters. I tried upgrading to the latest k9s version, but the issue persists.

Apr 30 '24 18:04 pdfrod

@pdfrod did you try the workaround i mention mv /home/$USER/.local/share/k9s/clusters /home/$USER/.local/share/k9s/clustersbad

Apr 30 '24 18:04 exinos-git

Just tried it, but it didn't make any difference for me unfortunately.

Apr 30 '24 18:04 pdfrod

Unfortunately I'm also running into this same issue.

After sourcing my new AWS temp credentials with MFA if i start k9s i have to wait several seconds for the context to be loaded properly and it starts working. However sometimes it doesn't load properly and I'm stuck with: "no connection to cached dial".

Version: v0.32.4 Commit: d3027c8f2916b23606f647f47b434b08fc34bdf8 Date: 2024-03-20T19:16:59Z

May 01 '24 07:05 cablekevin

Having the same issue, in some cases k9s appears to reload itself and somehow the issue resolves itself but I'm not quite sure how to trigger it. I tried switching between clusters or hitting ctrl + r.

I even tried to re-authenticate outside of k9s but the UI eventually seemed to refresh on its own after several seconds. It might be helpful to be able to trigger whatever refresh process seemingly happens in the background manually either when refreshing with ctrl + r or with another command.

May 03 '24 22:05 olivierlacan

I ran into this problem today with clusters in both EKS and GKE, and here's how I solved it:

rename the current k9s config clusters folder to clustersbad with @exinos-git 's mv command. or delete it. your choice a. N.B.: if you're on OSX, the default K9s config directory is at ~/Library/Application\ Support/k9s
re-authenticate to your clusters out-of-band and update the kubeconfig a. for EKS aws eks update-kubeconfig --name {cluster name} --region {cluster region} b. for GKE gcloud container clusters get-credentials {cluster name} --region {cluster region}
run K9s

After following these three steps, k9s automatically boots into the last context I connected to.

I believe what happened in my case was that I updated the names of my contexts in my ~/.kube/config file directly, instead of renaming them in k9s, and that screwed up the mappings between my kubeconfig contexts and the cluster configurations in k9s

May 07 '24 20:05 eric-gt

Most likely a duplicate of https://github.com/derailed/k9s/issues/2651

May 08 '24 10:05 wolffberg

Most likely a duplicate of #2651

Yes, in my case https://github.com/derailed/k9s/issues/2651 was exactly the problem I was having. Setting a current-context fixed the problem for me, although it would be nice to not have to set one, as I have multiple clusters and I prefer to be explicit about the cluster I'm currently using.

May 15 '24 09:05 pdfrod

Most likely a duplicate of #2651

Yes, in my case #2651 was exactly the problem I was having. Setting a current-context fixed the problem for me, although it would be nice to not have to set one, as I have multiple clusters and I prefer to be explicit about the cluster I'm currently using.

+1

May 16 '24 07:05 syselement

Maybe not related, but: we are using GCP and what has helped me was:

gcloud components install kubectl

May 31 '24 06:05 zolv

Same issue here with GCP and AZ clusters

Jul 15 '24 11:07 anvy2

Same issue here with AZ and Openshift clusters

Jul 22 '24 04:07 heamaral

Most likely a duplicate of #2651

Actually not, since I am having the problem as well and I do have current-context set in my ~/.kube/config. This bug seems to happen in so many different places, with different configurations, and people seem to be "resolving" it in different ways (probably it just disappeared for some random reason too, and people think they resolved it). In my case, I have a bunch of clusters there, Google Cloud and AWS. I tried every "solution" mentioned here and nothing works.

Edit: I even moved my ~/.kube/config to ~/kube.config.bak and re-connected with a single cluster in AWS, so a brand new ~/.kube/config file with only one entry, and it still fails, same error.

Aug 21 '24 11:08 jose-lpa

FWIW, I just ran in to this. I was deleting a cluster while I had k9s open so I could watch the nodes being drained. I didn't exit k9s by the time the cluster was being deleted.

I tried the workarounds listed about, but ultimately I had to switch the current-context from the cluster that was just deleted to a valid context.

Aug 27 '24 21:08 marchchad

k9s k9s copied to clipboard

no connection for cached dial! for eks cluster

k9s
k9s copied to clipboard