sealos icon indicating copy to clipboard operation
sealos copied to clipboard

sealos4.0首次安装失败,再次安装没有任何提示且安装未成功

Open cyl-007 opened this issue 2 years ago • 11 comments

sealos版本:4.0.0-rc1 首次安装:由于网络问题失败,如下 6aae7609cde9a7f982c693b78ae5b94

解决完网络问题,再次安装,如下 754d4b64aeda0d18b919ee147af41bd

验证实际未安装成功

cyl-007 avatar Jun 27 '22 06:06 cyl-007

@Ficus-f fix 这个问题,找到报错的地方,错误的地方要返回不能继续往下执行,可以用一个错误的镜像地址进行测试

fanux avatar Jun 27 '22 10:06 fanux

[root@ip-172-31-39-15 ec2-user]# sealos run labring/kubernetes:v1.24.0 labring/calico:v3.22.1 --masters 172.31.39.15 
2022-06-27 14:18:19 [INFO] start to install app in this cluster
2022-06-27 14:18:19 [INFO] succeeded install app in this cluster: no change apps
2022-06-27 14:18:19 [INFO] start to scale this cluster
2022-06-27 14:18:19 [INFO] succeeded in scaling this cluster: no change nodes
2022-06-27 14:18:19 [INFO] 
      ___           ___           ___           ___       ___           ___
     /\  \         /\  \         /\  \         /\__\     /\  \         /\  \
    /::\  \       /::\  \       /::\  \       /:/  /    /::\  \       /::\  \
   /:/\ \  \     /:/\:\  \     /:/\:\  \     /:/  /    /:/\:\  \     /:/\ \  \
  _\:\~\ \  \   /::\~\:\  \   /::\~\:\  \   /:/  /    /:/  \:\  \   _\:\~\ \  \
 /\ \:\ \ \__\ /:/\:\ \:\__\ /:/\:\ \:\__\ /:/__/    /:/__/ \:\__\ /\ \:\ \ \__\
 \:\ \:\ \/__/ \:\~\:\ \/__/ \/__\:\/:/  / \:\  \    \:\  \ /:/  / \:\ \:\ \/__/
  \:\ \:\__\    \:\ \:\__\        \::/  /   \:\  \    \:\  /:/  /   \:\ \:\__\
   \:\/:/  /     \:\ \/__/        /:/  /     \:\  \    \:\/:/  /     \:\/:/  /
    \::/  /       \:\__\         /:/  /       \:\__\    \::/  /       \::/  /
     \/__/         \/__/         \/__/         \/__/     \/__/         \/__/

                  Website :https://www.sealos.io/
                  Address :github.com/labring/sealos

是因为首次失败了 .sealos/default/Clusterfile 生成了,再次执行时就直接没检测到差异就直接提示成功了。 应该与之前修改的一个优化点有关,而不是简单的错误返回。

run 的时候逻辑有问题,应该去检测真实集群与命令行参数的关系,而不应该把命令行和 .sealos/default/Clusterfile 进行对比。

执行失败时要不要生成 Clusterfile 还是有待探讨,或者应该有个字段表明它是失败的。

fanux avatar Jun 27 '22 14:06 fanux

现在的优化不阻碍执行,不管对错都会保存文件。我感觉可以在最后看一下错误是什么 然后提示出来

type ClusterStatus struct {
	Phase      ClusterPhase       `json:"phase,omitempty"`
	Mounts     []MountImage       `json:"mounts,omitempty"`
	Conditions []ClusterCondition `json:"conditions,omitempty" `
}

Phase 是整个的集群的状态

cuisongliu avatar Jun 27 '22 16:06 cuisongliu

我觉得这还不是一个最优方案,最优的方法应该按照 controller 的编码标准来,每个字段比对->执行->存状态 这样pipeline 每执行到一个阶段都会更新对应的字段 会更合理一些。

fanux avatar Jun 27 '22 17:06 fanux

我觉得这还不是一个最优方案,最优的方法应该按照 controller 的编码标准来,每个字段比对->执行->存状态 这样pipeline 每执行到一个阶段都会更新对应的字段 会更合理一些。

每一个包的状态应该是定义在包内吧,不然又会像3.0一样,针对kubernetes的搭建写了很多代码在二进制里无法修改。

dashjay avatar Jun 28 '22 01:06 dashjay

我也遇到了

vzardlloo avatar Jun 29 '22 17:06 vzardlloo

我也遇到了

这个问题我们正在修复

fanux avatar Jun 29 '22 17:06 fanux

022-06-30 15:17:25 [EROR] Applied to cluster error: failed to join node 121.32.254.132:33568 failed to execute command(kubeadm join --config=/var/lib/sealos/data/default/etc/kubeadm-join-node.yaml -v 0) on host(121.32.254.132:33568): output([preflight] Running pre-flight checks
	[WARNING FileExisting-socat]: socat not found in system path
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0630 15:15:30.070122  609824 utils.go:69] The recommended value for "resolvConf" in "KubeletConfiguration" is: /run/systemd/resolve/resolv.conf; the provided value is: /run/systemd/resolve/resolv.conf
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'
error execution phase kubelet-start: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher), error(Process exited with status 1)
2022-06-30 15:17:25 [INFO]
      ___           ___           ___           ___       ___           ___
     /\  \         /\  \         /\  \         /\__\     /\  \         /\  \
    /::\  \       /::\  \       /::\  \       /:/  /    /::\  \       /::\  \
   /:/\ \  \     /:/\:\  \     /:/\:\  \     /:/  /    /:/\:\  \     /:/\ \  \
  _\:\~\ \  \   /::\~\:\  \   /::\~\:\  \   /:/  /    /:/  \:\  \   _\:\~\ \  \
 /\ \:\ \ \__\ /:/\:\ \:\__\ /:/\:\ \:\__\ /:/__/    /:/__/ \:\__\ /\ \:\ \ \__\
 \:\ \:\ \/__/ \:\~\:\ \/__/ \/__\:\/:/  / \:\  \    \:\  \ /:/  / \:\ \:\ \/__/
  \:\ \:\__\    \:\ \:\__\        \::/  /   \:\  \    \:\  /:/  /   \:\ \:\__\
   \:\/:/  /     \:\ \/__/        /:/  /     \:\  \    \:\/:/  /     \:\/:/  /
    \::/  /       \:\__\         /:/  /       \:\__\    \::/  /       \::/  /
     \/__/         \/__/         \/__/         \/__/     \/__/         \/__/

                  Website :https://www.sealos.io/
                  Address :github.com/labring/sealos

root@RK05-FRP-A001:~#
root@RK05-FRP-A001:~#
root@RK05-FRP-A001:~#
root@RK05-FRP-A001:~# ./sealos delete --nodes 121.32.254.131 kube^C
root@RK05-FRP-A001:~# kubectl  get pod -n kube-system
NAME                                    READY   STATUS    RESTARTS   AGE
coredns-64897985d-8rbqw                 0/1     Pending   0          15m
coredns-64897985d-cvjdp                 0/1     Pending   0          15m
etcd-rk05-frp-a001                      1/1     Running   0          15m
kube-apiserver-rk05-frp-a001            1/1     Running   0          15m
kube-controller-manager-rk05-frp-a001   1/1     Running   3          15m
kube-proxy-4tdst                        1/1     Running   0          15m
kube-proxy-bgwh4                        1/1     Running   0          15m
kube-proxy-rvvvw                        1/1     Running   0          15m
kube-scheduler-rk05-frp-a001            1/1     Running   3          15m

node如果加入失败 导致后面的流程也没操作。ipvs没加入

cuisongliu avatar Jun 30 '22 07:06 cuisongliu

Get current cluster:

type xxx Interface {
      GetCurrentCluster() (*v2.Cluster,error)
}

fanux avatar Jun 30 '22 13:06 fanux

https://github.com/kubernetes/client-go/blob/master/examples/out-of-cluster-client-configuration/main.go

fanux avatar Jun 30 '22 13:06 fanux

https://github.com/kubernetes/client-go/blob/master/examples/out-of-cluster-client-configuration/main.go

Can using

https://github.com/labring/sealos/blob/7146cfe47ad591fa7ef26704b3b84b6ef4c0139b/pkg/client-go/kubernetes/idempotency.go#L53

https://github.com/labring/sealos/blob/7146cfe47ad591fa7ef26704b3b84b6ef4c0139b/pkg/runtime/utils.go#L108

You can merge the logic of operating node.

cuisongliu avatar Jun 30 '22 14:06 cuisongliu

@cuisongliu 这个问题修复没有, 貌似4.1.0-rc2版本还有这个问题。

我机器默认安装了docker,提示需要卸载docker

2022-08-22T11:13:18 info Executing pipeline RunConfig in CreateProcessor.
2022-08-22T11:13:18 info Executing pipeline MountRootfs in CreateProcessor.
/usr/bin/docker
 ERROR [2022-08-22 11:13:33] >> The machine docker is not clean. Please clean docker the system.
2022-08-22T11:13:33 error Applied to cluster error: exit status 1
2022-08-22T11:13:33 info

卸载完docker执行安装, 貌似没有执行安装操作

root@VM-16-26-debian:~# sealos run labring/kubernetes:v1.24.0 labring/calico:v3.22.1 --masters 10.10.16.26
2022-08-22T11:16:29 info sync new version copy pki config: /var/lib/sealos/data/default/pki /root/.sealos/default/pki
2022-08-22T11:16:29 info sync new version copy etc config: /var/lib/sealos/data/default/etc /root/.sealos/default/etc
2022-08-22T11:16:29 info start to install app in this cluster
2022-08-22T11:16:29 info succeeded install app in this cluster: no change apps
2022-08-22T11:16:29 info start to scale this cluster
2022-08-22T11:16:29 info succeeded in scaling this cluster: no change nodes
2022-08-22T11:16:29 info
      ___           ___           ___           ___       ___           ___
     /\  \         /\  \         /\  \         /\__\     /\  \         /\  \
    /::\  \       /::\  \       /::\  \       /:/  /    /::\  \       /::\  \
   /:/\ \  \     /:/\:\  \     /:/\:\  \     /:/  /    /:/\:\  \     /:/\ \  \
  _\:\~\ \  \   /::\~\:\  \   /::\~\:\  \   /:/  /    /:/  \:\  \   _\:\~\ \  \
 /\ \:\ \ \__\ /:/\:\ \:\__\ /:/\:\ \:\__\ /:/__/    /:/__/ \:\__\ /\ \:\ \ \__\
 \:\ \:\ \/__/ \:\~\:\ \/__/ \/__\:\/:/  / \:\  \    \:\  \ /:/  / \:\ \:\ \/__/
  \:\ \:\__\    \:\ \:\__\        \::/  /   \:\  \    \:\  /:/  /   \:\ \:\__\
   \:\/:/  /     \:\ \/__/        /:/  /     \:\  \    \:\/:/  /     \:\/:/  /
    \::/  /       \:\__\         /:/  /       \:\__\    \::/  /       \::/  /
     \/__/         \/__/         \/__/         \/__/     \/__/         \/__/

                  Website :https://www.sealos.io/
                  Address :github.com/labring/sealos

目前只能将/root/.sealos删除才可以

ysicing avatar Aug 22 '22 03:08 ysicing

用 run --force 参数即可

fanux avatar Aug 24 '22 06:08 fanux

好像这个bug还在,只要第一次安装出现报错(any),第二次安装默认成功,但显示的是未安装成功

Deapou avatar Mar 19 '23 13:03 Deapou

image

enzetan avatar Apr 15 '23 17:04 enzetan

各种问题。。反正就是不行

jinnery avatar Jan 04 '24 05:01 jinnery

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Various questions. . It just doesn’t work anyway

sealos-ci-robot avatar Jan 04 '24 05:01 sealos-ci-robot