sidero
sidero copied to clipboard
context deadline exceeded
Hi everybody. I'm trying to setup a kubernetes cluster attending the Quickstart Guide. My environment is not so open. I mean. My teslab is behinf a proxy and also can use a custom DNS resolver. But while the cluster create has an option about the resolver I cannot understand how to address about the proxy.
This is what I have once launched:
# talosctl cluster create --wait --nameservers "x.x.x.x"
validating CIDR and reserving IPs
generating PKI and tokens
creating network talos-default
creating master nodes
creating worker nodes
renamed talosconfig context "talos-default" -> "talos-default-23"
waiting for API
bootstrap error: 3 error(s) occurred:
rpc error: code = DeadlineExceeded desc = context deadline exceeded
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing failed to do connect handshake, response: \"HTTP/1.1 502 Connection timed out\\r\\nConnection: close\\r\\nContent-Type: text/html\\r\\n\\r\\n<html><body><h1>502 Connection timed out</h1><p><a href='http://cntlm.sf.net/'>Cntlm</a> proxy failed to complete the request.</p></body></html>\""
timeout
#
taking a look at the docker ogs for the master I see:
2022-08-03T15:26:36.718572189Z time="2022-08-03T15:26:36Z" level=info msg="trying next host" error="failed to do request: Head \"https://ghcr.io/v2/siderolabs/kubelet/manifests/v1.24.2\": dial tcp 140.82.121.34:443: i/o timeout" host=ghcr.io
2022-08-03T15:26:36.719672986Z [talos] 2022/08/03 15:26:36 retrying error: failed to pull image "ghcr.io/siderolabs/kubelet:v1.24.2": failed to resolve reference "ghcr.io/siderolabs/kubelet:v1.24.2": failed to do request: Head "https://ghcr.io/v2/siderolabs/kubelet/manifests/v1.24.2": dial tcp 140.82.121.34:443: i/o timeout
Caused by the lack of proxy setting.
Any hints?
Thanks in advance. Gabriele
Talos needs explicit proxy settings to be set up, e.g. with a config patch.
talosctl cluster create ... -p '[{"op": "add", "path": "/machine/env", "value": {"http_proxy": "....", "https_proxy": "...."}}]
this is a tricky one, as you might want to disable http proxy for in-cluster communication
you can also workaround that by using registry mirrors which themselves handle http proxy, while Talos pulls via the mirror: https://www.talos.dev/v1.1/talos-guides/configuration/pull-through-cache/
Thanks for your answer @smira but the flag "-p" it is for the exposed ports. Isn't it? Instead using --config-patch flag the download seems to be ok. Nevertheless the cluster actually does not start.
Talos needs explicit proxy settings to be set up, e.g. with a config patch.
talosctl cluster create ... -p '[{"op": "add", "path": "/machine/env", "value": {"http_proxy": "....", "https_proxy": "...."}}]
this is a tricky one, as you might want to disable http proxy for in-cluster communication
Thanks for your answer @smira but the flag "-p" it is for the exposed ports. Isn't it? Instead using --config-patch flag the download seems to be ok. Nevertheless the cluster actually does not start.
Yes, it should have been --config-patch
. You might need an env variable no_proxy: 10.5.0.0/24
(if using default CIDRs with talosctl cluster create
).
Thanks, one again, @smira . Now it seems fine about the proxy part.
Sorry @smira . Can I ask you if you can figure what is wrong.
On the logs I see these evidences:
2022-08-08T14:10:31.449290358Z [talos] 2022/08/08 14:10:31 service[etcd](Failed): Failed to run pre stage: failed to pull image "gcr.io/etcd-development/etcd:v3.5.4": 1 error(s) occurred:
2022-08-08T14:10:31.449302761Z failed to pull image "gcr.io/etcd-development/etcd:v3.5.4": context canceled
But this image is downloadable via docker.
What I have to debug for this problem? Thanks in advance. Best regards.
These last messages are completely fine, they are printed when bootstrap aborts image pull, but it should continue going on in the background after the bootstrap.
OK @smira but at the end the cluster isn't up. At the end it exit with this message:
`waiting for etcd to be healthy: OK
◲ waiting for etcd members to be consistent across nodes: rpc error: code = DeadlineExceeded desc = context deadline exceeded
context deadline exceeded
:~#`
And that image isn't visible (via docker command) while I search for it:
`:~# talosctl images
ghcr.io/siderolabs/flannel:v0.18.1
ghcr.io/siderolabs/install-cni:v1.1.0-2-gcb03a5d
docker.io/coredns/coredns:1.9.3
gcr.io/etcd-development/etcd:v3.5.4
k8s.gcr.io/kube-apiserver:v1.24.2
k8s.gcr.io/kube-controller-manager:v1.24.2
k8s.gcr.io/kube-scheduler:v1.24.2
k8s.gcr.io/kube-proxy:v1.24.2
ghcr.io/siderolabs/kubelet:v1.24.2
ghcr.io/siderolabs/installer:v1.1.1
k8s.gcr.io/pause:3.6
:~# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ghcr.io/siderolabs/talos v1.1.1 648080035f8c 3 weeks ago 174MB
:~# `
Talos doesn't use docker
to pull images, the issue should be in the docker logs talos-default-master-N
. it might be timing out or some other issue.
It looks like it passed the basic bootstrap, probably talosctl -n 10.5.0.2 etcd members
might help
Hi @smira Sorry if I'm giving you a feedback too late. If I try to query that node, as you suggested I receive a connection timed out. What can I do to debug this problem.
# talosctl -n 10.5.0.2 etcd members
error getting members: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing failed to do connect handshake, response: \"HTTP/1.1 502 Connection timed out\\r\\nConnection: close\\r\\nContent-Type: text/html\\r\\n\\r\\n<html><body><h1>502 Connection timed out</h1><p><a href='http://cntlm.sf.net/'>Cntlm</a> proxy failed to complete the request.</p></body></html>\""
#
By the way. To create the cluster I use via CLI these parameters:
#export no_proxy="localhost,127.0.0.1,10.5.0.0/24"; talosctl cluster create --wait --dns-domain "163.162.4.70" --nameservers "163.162.4.70" --config-patch '[{"op": "add", "path": "/machine/env", "value": {"http_proxy": "http://163.162.95.56:3128", "https_proxy": "http://163.162.95.56:3128"}}]'
validating CIDR and reserving IPs
generating PKI and tokens
creating network talos-default
creating master nodes
creating worker nodes
renamed talosconfig context "talos-default" -> "talos-default-64"
waiting for API
bootstrapping cluster
waiting for etcd to be healthy: OK
◱ waiting for etcd members to be consistent across nodes: rpc error: code = DeadlineExceeded desc = context deadline exceeded
context deadline exceeded
#
But at the end it seems to fail. What about? Thanks in advance
Probably you need to keep no_proxy=...
for the talosctl
calls as well, as talosctl
tries to go via your proxy. you don't want that. alternatively you can unset http_proxy
, https_proxy
environment variables.
You're right @smira . Setting the env variable no_proxy I have
# talosctl -n 10.5.0.2 etcd members
NODE ID HOSTNAME PEER URLS CLIENT URLS LEARNER
10.5.0.2 c3d3020cf75b8728 talos-default-controlplane-1 https://10.5.0.2:2380 https://10.5.0.2:2379 false
#
Hi all. I've modified the talosctl cluster instauction in this way:
...
talosctl cluster create --wait --nameservers "X.Y.X.Y" --config-patch '[{"op": "add", "path": "/machine/env", "value": {"http_proxy": "http://X.X.X.X:xx", "https_proxy": "https://X.X.X.X:xx", "no_proxy": "localhost,127.0.0.1,10.5.0.0/24,0.0.0.0"}}]'
...
and the process seems to be completed. The output is this one:
~# validating CIDR and reserving IPs
generating PKI and tokens
creating network talos-default
creating controlplane nodes
creating worker nodes
renamed talosconfig context "talos-default" -> "talos-default-83"
waiting for API
bootstrapping cluster
waiting for etcd to be healthy: OK
waiting for etcd members to be consistent across nodes: OK
waiting for etcd members to be control plane nodes: OK
waiting for apid to be ready: OK
waiting for kubelet to be healthy: OK
waiting for all nodes to finish boot sequence: OK
waiting for all k8s nodes to report: OK
waiting for all k8s nodes to report ready: OK
waiting for all control plane components to be ready: OK
waiting for kube-proxy to report ready: OK
waiting for coredns to report ready: OK
waiting for all k8s nodes to report schedulable: OK
merging kubeconfig into "/root/.kube/config"
renamed cluster "talos-default" -> "talos-default-1"
renamed auth info "admin@talos-default" -> "admin@talos-default-1"
renamed context "admin@talos-default" -> "admin@talos-default-1"
PROVISIONER docker
NAME talos-default
NETWORK NAME talos-default
NETWORK CIDR 10.5.0.0/24
NETWORK GATEWAY 10.5.0.1
NETWORK MTU 1500
NODES:
NAME TYPE IP CPU RAM DISK
/talos-default-controlplane-1 controlplane 10.5.0.2 2.00 2.1 GB -
/talos-default-worker-1 worker 10.5.0.3 2.00 2.1 GB -
~#
Obviously when I have to use the talosctl command to make some query/command I have to set the environment variable no_proxy. But. Is there a better way to set this variable without modifying the user behavior?
Is there a better way to set this variable without modifying the user behavior?
it's a question about your environment, not really Talos.
you can configure no_proxy
to skip private CIDRs, but this depends on the network environment you have.
Yes. You're right @smira I apologize. I was asking it as an advice and not as a support.
Thanks. Gabriele
okay, thanks, I'm going to close this one as we seem to have a solution.