talos icon indicating copy to clipboard operation
talos copied to clipboard

Node fails to gather membership

Open mlima-gh opened this issue 6 months ago • 3 comments

I'm running a 1.10.5 with cilium cni behing a MITM Proxy in a bare-metal single node configuration.

I've applied this exact configuration to several other locations without the mitm proxy, i'm missing something? How can I test the certificate added in machine.files.content?

After the bootstrap, the node fail to get membership and give kubectl access. I can't apply the cilium cni.

kubectl get all -A gives "Unhandled Error" err="couldn't get current server API group list: Get "https://192.168.1.204:6443/api?timeout=32s": EOF"

talosctl dmesg shows consistent controller failures user: warning: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-wz5atf: Get "https://127.0.0.1:7445/api?timeout=32s": EOF"} user: warning: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server ("Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=pods)") has prevented the request from succeeding"} user: warning: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.NodeApplyController", "error": "1 error(s) occurred:\n\ttimeout"} user: warning: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-wz5atf: Get "https://127.0.0.1:7445/api?timeout=32s": EOF"} user: warning: [talos] kubernetes endpoint watch error {"component": "controller-runtime", "controller": "k8s.EndpointController", "error": "failed to list *v1.Endpoints: Get "https://192.168.1.204:6443/api/v1/namespaces/default/endpoints?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0": dial tcp 192.168.1.204:6443: connect: connection refused"}

No members are shown with talosctl get members.

I tried to use air gapped instalation (https://www.siderolabs.com/blog/air-gapped-kubernetes-with-talos-linux/) as an alternative but failed miserably.

Thanks

mlima-gh avatar Jul 14 '25 11:07 mlima-gh

There is not enough information to help you. From the logs it looks like kube-apiserver doesn't run, but we can't guess why.

We have a Troubleshooting guide.

smira avatar Jul 14 '25 14:07 smira

I get the same errors during bare-metal installation with the MITM proxy configuration (https/http/no proxy env vars). Without the MITM proxy, it works. Some Talos services (in the logs) do not go through this MITM proxy. Result: etcd doesn't works, and many other things.

My no_proxy configuration for dns,ntp,localhost,subnet,pod network, service network etc : localhost,127.0.0.0/8,192.168.0.0/16,.mycompany.com,.svc,172.17.0.0/16,172.18.0.0/16,10.1.0.0/16

I know that MITM proxy is an old method (compared to transparent proxy) and that it poses many problems. But I can't get around it for production applications at the moment...

ktoulliou avatar Aug 08 '25 12:08 ktoulliou

Same here, create new cluster in Omni. Make one machine controlplane gives : [talos] controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-xxxxxx: Get "https://127.0.0.1:7445/api?timeout=32s": EOF"}

peterbosalliandercom avatar Nov 05 '25 15:11 peterbosalliandercom