sealos icon indicating copy to clipboard operation
sealos copied to clipboard

Question: 集群环境 增加master失败

Open liuziyuan opened this issue 3 years ago • 2 comments
trafficstars

当前集群环境 , 2 master 1 node

root@master1:~/.sealos/default# kubectl get pod -A
NAMESPACE         NAME                                       READY   STATUS    RESTARTS       AGE
calico-system     calico-kube-controllers-6b44b54755-b4fvq   1/1     Running   14 (18m ago)   18h
calico-system     calico-node-g9gjl                          1/1     Running   1 (33m ago)    18h
calico-system     calico-node-nw92w                          1/1     Running   2 (32m ago)    18h
calico-system     calico-node-v9rcc                          1/1     Running   2 (31m ago)    18h
calico-system     calico-typha-86c4d6d567-qblhd              1/1     Running   2 (32m ago)    18h
calico-system     calico-typha-86c4d6d567-sxrd4              1/1     Running   1 (33m ago)    18h
kube-system       coredns-6d4b75cb6d-6kpz7                   1/1     Running   2 (32m ago)    18h
kube-system       coredns-6d4b75cb6d-ssfrm                   1/1     Running   2 (32m ago)    18h
kube-system       etcd-master1                               1/1     Running   4 (26m ago)    18h
kube-system       etcd-master2                               1/1     Running   1 (33m ago)    18h
kube-system       kube-apiserver-master1                     1/1     Running   11 (25m ago)   18h
kube-system       kube-apiserver-master2                     1/1     Running   1 (33m ago)    18h
kube-system       kube-controller-manager-master1            1/1     Running   3 (31m ago)    18h
kube-system       kube-controller-manager-master2            1/1     Running   3 (19m ago)    18h
kube-system       kube-proxy-4pq8v                           1/1     Running   1 (30m ago)    18h
kube-system       kube-proxy-4zbpx                           1/1     Running   1 (33m ago)    18h
kube-system       kube-proxy-qdz2r                           1/1     Running   2 (32m ago)    18h
kube-system       kube-scheduler-master1                     1/1     Running   3 (30m ago)    18h
kube-system       kube-scheduler-master2                     1/1     Running   3 (19m ago)    18h
kube-system       kube-sealos-lvscare-node1                  1/1     Running   2 (32m ago)    18h
tigera-operator   tigera-operator-d7957f5cc-qmj4n            1/1     Running   9 (19m ago)    18h

需要增加一个master,执行 sealos add --masters 192.168.56.12 报错如下

192.168.56.12:22: [download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
192.168.56.12:22: error execution phase control-plane-prepare/download-certs: error downloading certs: error downloading the secret: Secret "kubeadm-certs" was not found in the "kube-system" Namespace. This Secret might have expired. Please, run `kubeadm init phase upload-certs --upload-certs` on a control plane to generate a new one
192.168.56.12:22: To see the stack trace of this error execute with --v=5 or higher
2022-09-22T02:24:13 error Applied to cluster error: exec kubeadm join in 192.168.56.12:22 failed failed to execute command(kubeadm join --config=/root/.sealos/default/etc/kubeadm-join-master.yaml -v 0 --ignore-preflight-errors=SystemVerification) on host(192.168.56.12:22): output(W0922 02:24:12.357475   29704 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration!
[preflight] Running pre-flight checks
        [WARNING FileExisting-socat]: socat not found in system path
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
error execution phase control-plane-prepare/download-certs: error downloading certs: error downloading the secret: Secret "kubeadm-certs" was not found in the "kube-system" Namespace. This Secret might have expired. Please, run `kubeadm init phase upload-certs --upload-certs` on a control plane to generate a new one
To see the stack trace of this error execute with --v=5 or higher), error(Process exited with status 1)
2022-09-22T02:24:13 info

不清楚问题在哪里

liuziyuan avatar Sep 22 '22 02:09 liuziyuan

像是因为没获取到证书

fanux avatar Sep 22 '22 08:09 fanux

系统信息,sealos 版本 和镜像版本都发一下哈,我看下如何复现

fanux avatar Sep 22 '22 08:09 fanux

kubeadm-certs has an expiration binding token(default 1 hour), when after 1 hour, the kubeadm-certs will auto gone, there has two method solve:

1、re upload kubeadm-certs agagin on first master bykubeadm init phase upload-certs --upload-certs, then join others master node as soon as possible. 2、set kubeadm-certs never expire when first master insalled just in time.

more info see https://github.com/dyrnq/kubeadm-vagrant/issues/11, hope it can help.

dyrnq avatar Oct 03 '22 06:10 dyrnq

@dyrnq sealos add will generate a new token

fanux avatar Oct 04 '22 02:10 fanux

kubeadm init phase upload-certs

image

如果token过期是会重新上传证书

cuisongliu avatar Oct 11 '22 03:10 cuisongliu

For the first time, I immediately added master2, and it was successful. After waiting for an hour to add it, the problem was reproduced.

100.75.75.31:22: error execution phase control-plane-prepare/download-certs: error downloading certs: error downloading the secret: Secret "kubeadm-certs" was not found in the "kube-system" Namespace. This Secret might have expired. Please, run `kubeadm init phase upload-certs --upload-certs` on a control plane to generate a new one
100.75.75.31:22: To see the stack trace of this error execute with --v=5 or higher
2022-10-18T11:03:01 error Applied to cluster error: exec kubeadm join in 100.75.75.31:22 failed failed to execute command(kubeadm join --config=/root/.sealos/default/etc/kubeadm-join-master.yaml -v 0 --ignore-preflight-errors=SystemVerification) on host(100.75.75.31:22): output(W1018 11:02:20.407213   13176 initconfiguration.go:119] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration!
[preflight] Running pre-flight checks
        [WARNING FileExisting-socat]: socat not found in system path
        [WARNING SystemVerification]: missing optional cgroups: blkio
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
error execution phase control-plane-prepare/download-certs: error downloading certs: error downloading the secret: Secret "kubeadm-certs" was not found in the "kube-system" Namespace. This Secret might have expired. Please, run `kubeadm init phase upload-certs --upload-certs` on a control plane to generate a new one
To see the stack trace of this error execute with --v=5 or higher), error(Process exited with status 1)

xiao-jay avatar Oct 18 '22 11:10 xiao-jay

kubeadm init phase upload-certs --upload-certs I see this is a command of sealctl,looks like execsealctl token will create certs,how to install sealctl? @cuisongliu

xiao-jay avatar Oct 18 '22 11:10 xiao-jay

@xiao-jay seems we can't reproduce this bug?

fanux avatar Oct 19 '22 10:10 fanux

For the first time, I immediately added master2, and it was successful. After waiting for an hour to add it, the problem was reproduced.

100.75.75.31:22: error execution phase control-plane-prepare/download-certs: error downloading certs: error downloading the secret: Secret "kubeadm-certs" was not found in the "kube-system" Namespace. This Secret might have expired. Please, run `kubeadm init phase upload-certs --upload-certs` on a control plane to generate a new one
100.75.75.31:22: To see the stack trace of this error execute with --v=5 or higher
2022-10-18T11:03:01 error Applied to cluster error: exec kubeadm join in 100.75.75.31:22 failed failed to execute command(kubeadm join --config=/root/.sealos/default/etc/kubeadm-join-master.yaml -v 0 --ignore-preflight-errors=SystemVerification) on host(100.75.75.31:22): output(W1018 11:02:20.407213   13176 initconfiguration.go:119] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration!
[preflight] Running pre-flight checks
        [WARNING FileExisting-socat]: socat not found in system path
        [WARNING SystemVerification]: missing optional cgroups: blkio
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
error execution phase control-plane-prepare/download-certs: error downloading certs: error downloading the secret: Secret "kubeadm-certs" was not found in the "kube-system" Namespace. This Secret might have expired. Please, run `kubeadm init phase upload-certs --upload-certs` on a control plane to generate a new one
To see the stack trace of this error execute with --v=5 or higher), error(Process exited with status 1)

1 hours after , exec kubeadm token list and cat ~/.sealos/default/etc/kubeadm-token.yaml @xiao-jay

cuisongliu avatar Oct 22 '22 03:10 cuisongliu

image

Proxy for managing TTL for the kubeadm-certs secret token is 1h . It seems that this cannot be modified from parameters.

cuisongliu avatar Oct 22 '22 03:10 cuisongliu