sealos
sealos copied to clipboard
Question: 集群环境 增加master失败
当前集群环境 , 2 master 1 node
root@master1:~/.sealos/default# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
calico-system calico-kube-controllers-6b44b54755-b4fvq 1/1 Running 14 (18m ago) 18h
calico-system calico-node-g9gjl 1/1 Running 1 (33m ago) 18h
calico-system calico-node-nw92w 1/1 Running 2 (32m ago) 18h
calico-system calico-node-v9rcc 1/1 Running 2 (31m ago) 18h
calico-system calico-typha-86c4d6d567-qblhd 1/1 Running 2 (32m ago) 18h
calico-system calico-typha-86c4d6d567-sxrd4 1/1 Running 1 (33m ago) 18h
kube-system coredns-6d4b75cb6d-6kpz7 1/1 Running 2 (32m ago) 18h
kube-system coredns-6d4b75cb6d-ssfrm 1/1 Running 2 (32m ago) 18h
kube-system etcd-master1 1/1 Running 4 (26m ago) 18h
kube-system etcd-master2 1/1 Running 1 (33m ago) 18h
kube-system kube-apiserver-master1 1/1 Running 11 (25m ago) 18h
kube-system kube-apiserver-master2 1/1 Running 1 (33m ago) 18h
kube-system kube-controller-manager-master1 1/1 Running 3 (31m ago) 18h
kube-system kube-controller-manager-master2 1/1 Running 3 (19m ago) 18h
kube-system kube-proxy-4pq8v 1/1 Running 1 (30m ago) 18h
kube-system kube-proxy-4zbpx 1/1 Running 1 (33m ago) 18h
kube-system kube-proxy-qdz2r 1/1 Running 2 (32m ago) 18h
kube-system kube-scheduler-master1 1/1 Running 3 (30m ago) 18h
kube-system kube-scheduler-master2 1/1 Running 3 (19m ago) 18h
kube-system kube-sealos-lvscare-node1 1/1 Running 2 (32m ago) 18h
tigera-operator tigera-operator-d7957f5cc-qmj4n 1/1 Running 9 (19m ago) 18h
需要增加一个master,执行 sealos add --masters 192.168.56.12
报错如下
192.168.56.12:22: [download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
192.168.56.12:22: error execution phase control-plane-prepare/download-certs: error downloading certs: error downloading the secret: Secret "kubeadm-certs" was not found in the "kube-system" Namespace. This Secret might have expired. Please, run `kubeadm init phase upload-certs --upload-certs` on a control plane to generate a new one
192.168.56.12:22: To see the stack trace of this error execute with --v=5 or higher
2022-09-22T02:24:13 error Applied to cluster error: exec kubeadm join in 192.168.56.12:22 failed failed to execute command(kubeadm join --config=/root/.sealos/default/etc/kubeadm-join-master.yaml -v 0 --ignore-preflight-errors=SystemVerification) on host(192.168.56.12:22): output(W0922 02:24:12.357475 29704 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration!
[preflight] Running pre-flight checks
[WARNING FileExisting-socat]: socat not found in system path
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
error execution phase control-plane-prepare/download-certs: error downloading certs: error downloading the secret: Secret "kubeadm-certs" was not found in the "kube-system" Namespace. This Secret might have expired. Please, run `kubeadm init phase upload-certs --upload-certs` on a control plane to generate a new one
To see the stack trace of this error execute with --v=5 or higher), error(Process exited with status 1)
2022-09-22T02:24:13 info
不清楚问题在哪里
像是因为没获取到证书
系统信息,sealos 版本 和镜像版本都发一下哈,我看下如何复现
kubeadm-certs has an expiration binding token(default 1 hour), when after 1 hour, the kubeadm-certs will auto gone, there has two method solve:
1、re upload kubeadm-certs agagin on first master bykubeadm init phase upload-certs --upload-certs, then join others master node as soon as possible.
2、set kubeadm-certs never expire when first master insalled just in time.
more info see https://github.com/dyrnq/kubeadm-vagrant/issues/11, hope it can help.
@dyrnq sealos add will generate a new token
kubeadm init phase upload-certs
如果token过期是会重新上传证书
For the first time, I immediately added master2, and it was successful. After waiting for an hour to add it, the problem was reproduced.
100.75.75.31:22: error execution phase control-plane-prepare/download-certs: error downloading certs: error downloading the secret: Secret "kubeadm-certs" was not found in the "kube-system" Namespace. This Secret might have expired. Please, run `kubeadm init phase upload-certs --upload-certs` on a control plane to generate a new one
100.75.75.31:22: To see the stack trace of this error execute with --v=5 or higher
2022-10-18T11:03:01 error Applied to cluster error: exec kubeadm join in 100.75.75.31:22 failed failed to execute command(kubeadm join --config=/root/.sealos/default/etc/kubeadm-join-master.yaml -v 0 --ignore-preflight-errors=SystemVerification) on host(100.75.75.31:22): output(W1018 11:02:20.407213 13176 initconfiguration.go:119] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration!
[preflight] Running pre-flight checks
[WARNING FileExisting-socat]: socat not found in system path
[WARNING SystemVerification]: missing optional cgroups: blkio
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
error execution phase control-plane-prepare/download-certs: error downloading certs: error downloading the secret: Secret "kubeadm-certs" was not found in the "kube-system" Namespace. This Secret might have expired. Please, run `kubeadm init phase upload-certs --upload-certs` on a control plane to generate a new one
To see the stack trace of this error execute with --v=5 or higher), error(Process exited with status 1)
kubeadm init phase upload-certs --upload-certs I see this is a command of sealctl,looks like exec
sealctl tokenwill create certs,how to install sealctl? @cuisongliu
@xiao-jay seems we can't reproduce this bug?
For the first time, I immediately added master2, and it was successful. After waiting for an hour to add it, the problem was reproduced.
100.75.75.31:22: error execution phase control-plane-prepare/download-certs: error downloading certs: error downloading the secret: Secret "kubeadm-certs" was not found in the "kube-system" Namespace. This Secret might have expired. Please, run `kubeadm init phase upload-certs --upload-certs` on a control plane to generate a new one 100.75.75.31:22: To see the stack trace of this error execute with --v=5 or higher 2022-10-18T11:03:01 error Applied to cluster error: exec kubeadm join in 100.75.75.31:22 failed failed to execute command(kubeadm join --config=/root/.sealos/default/etc/kubeadm-join-master.yaml -v 0 --ignore-preflight-errors=SystemVerification) on host(100.75.75.31:22): output(W1018 11:02:20.407213 13176 initconfiguration.go:119] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration! [preflight] Running pre-flight checks [WARNING FileExisting-socat]: socat not found in system path [WARNING SystemVerification]: missing optional cgroups: blkio [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [preflight] Running pre-flight checks before initializing the new control plane instance [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' [download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace error execution phase control-plane-prepare/download-certs: error downloading certs: error downloading the secret: Secret "kubeadm-certs" was not found in the "kube-system" Namespace. This Secret might have expired. Please, run `kubeadm init phase upload-certs --upload-certs` on a control plane to generate a new one To see the stack trace of this error execute with --v=5 or higher), error(Process exited with status 1)
1 hours after , exec kubeadm token list and cat ~/.sealos/default/etc/kubeadm-token.yaml @xiao-jay

Proxy for managing TTL for the kubeadm-certs secret token is 1h . It seems that this cannot be modified from parameters.