尝试离线安装时出现了错误
What is version of KubeKey has the issue?
kk version: &version.Info{Major:"3", Minor:"0", GitVersion:"v3.0.7", GitCommit:"e755baf67198d565689d7207378174f429b508ba", GitTreeState:"clean", BuildDate:"2023-01-18T01:57:24Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}
What is your os environment?
ubuntu22.04
KubeKey config file
apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
name: hhjmk-kube
spec:
hosts:
- {name: master, address: 10.111.0.1, internalAddress: 10.111.0.1, user: root, privateKeyPath: "/root/pri-key"}
- {name: harbor, address: 10.111.0.100, internalAddress: 10.111.0.100, user: root, privateKeyPath: "/root/pri-key"}
- {name: node1, address: 10.111.0.2, internalAddress: 10.111.0.2, user: root, privateKeyPath: "/root/pri-key"}
roleGroups:
etcd:
- master
control-plane:
- master
worker:
- node1
registry:
- harbor
controlPlaneEndpoint:
## Internal loadbalancer for apiservers
# internalLoadbalancer: haproxy
domain: lb.kubesphere.local
address: ""
port: 6443
kubernetes:
version: v1.21.14
clusterName: hhjmk.kube
autoRenewCerts: true
containerManager: docker
etcd:
type: kubekey
network:
plugin: calico
kubePodsCIDR: 10.233.64.0/18
kubeServiceCIDR: 10.233.0.0/18
## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
multusCNI:
enabled: false
registry:
type: harbor
auths:
"dockerhub.kubekey.local":
username: admin
password: [I delete this]
privateRegistry: "dockerhub.kubekey.local"
namespaceOverride: "kubesphereio"
#privateRegistry: ""
#namespaceOverride: ""
registryMirrors: []
insecureRegistries: []
addons: []
---
apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
name: ks-installer
namespace: kubesphere-system
labels:
version: v3.3.2
spec:
persistence:
storageClass: ""
authentication:
jwtSecret: ""
zone: ""
local_registry: ""
namespace_override: ""
# dev_tag: ""
etcd:
monitoring: false
endpointIps: localhost
port: 2379
tlsEnable: true
common:
core:
console:
enableMultiLogin: true
port: 30880
type: NodePort
# apiserver:
# resources: {}
# controllerManager:
# resources: {}
redis:
enabled: false
volumeSize: 2Gi
openldap:
enabled: false
volumeSize: 2Gi
minio:
volumeSize: 20Gi
monitoring:
# type: external
endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090
GPUMonitoring:
enabled: false
gpu:
kinds:
- resourceName: "nvidia.com/gpu"
resourceType: "GPU"
default: true
es:
# master:
# volumeSize: 4Gi
# replicas: 1
# resources: {}
# data:
# volumeSize: 20Gi
# replicas: 1
# resources: {}
logMaxAge: 7
elkPrefix: logstash
basicAuth:
enabled: false
username: ""
password: ""
externalElasticsearchHost: ""
externalElasticsearchPort: ""
alerting:
enabled: false
# thanosruler:
# replicas: 1
# resources: {}
auditing:
enabled: false
# operator:
# resources: {}
# webhook:
# resources: {}
devops:
enabled: false
# resources: {}
jenkinsMemoryLim: 8Gi
jenkinsMemoryReq: 4Gi
jenkinsVolumeSize: 8Gi
events:
enabled: false
# operator:
# resources: {}
# exporter:
# resources: {}
# ruler:
# enabled: true
# replicas: 2
# resources: {}
logging:
enabled: false
logsidecar:
enabled: true
replicas: 2
# resources: {}
metrics_server:
enabled: false
monitoring:
storageClass: ""
node_exporter:
port: 9100
# resources: {}
# kube_rbac_proxy:
# resources: {}
# kube_state_metrics:
# resources: {}
# prometheus:
# replicas: 1
# volumeSize: 20Gi
# resources: {}
# operator:
# resources: {}
# alertmanager:
# replicas: 1
# resources: {}
# notification_manager:
# resources: {}
# operator:
# resources: {}
# proxy:
# resources: {}
gpu:
nvidia_dcgm_exporter:
enabled: false
# resources: {}
multicluster:
clusterRole: none
network:
networkpolicy:
enabled: false
ippool:
type: none
topology:
type: none
openpitrix:
store:
enabled: false
servicemesh:
enabled: false
istio:
components:
ingressGateways:
- name: istio-ingressgateway
enabled: false
cni:
enabled: false
edgeruntime:
enabled: false
kubeedge:
enabled: false
cloudCore:
cloudHub:
advertiseAddress:
- ""
service:
cloudhubNodePort: "30000"
cloudhubQuicNodePort: "30001"
cloudhubHttpsNodePort: "30002"
cloudstreamNodePort: "30003"
tunnelNodePort: "30004"
# resources: {}
# hostNetWork: false
iptables-manager:
enabled: true
mode: "external"
# resources: {}
# edgeService:
# resources: {}
terminal:
timeout: 600
A clear and concise description of what happend.
在尝试离线安装 的时候遇到了错误,此处是我第三次尝试重新安装,出现了新的错误,先前的错误是貌似是由于尝试删除只读文件系统中的文件所引起的,我注意到iso文件被只读挂载,并且最终报错的输出是
This is a simple check of your environment.
Before installation, ensure that your machines meet all requirements specified at
https://github.com/kubesphere/kubekey#requirements-and-recommendations
Continue this installation? [yes/no]: yes
05:26:04 UTC success: [LocalHost]
05:26:04 UTC [UnArchiveArtifactModule] Check the KubeKey artifact md5 value
05:26:39 UTC success: [LocalHost]
05:26:39 UTC [UnArchiveArtifactModule] UnArchive the KubeKey artifact
05:26:39 UTC skipped: [LocalHost]
05:26:39 UTC [UnArchiveArtifactModule] Create the KubeKey artifact Md5 file
05:26:39 UTC skipped: [LocalHost]
05:26:39 UTC [RepositoryModule] Get OS release
05:26:39 UTC success: [master]
05:26:39 UTC success: [harbor]
05:26:39 UTC success: [node1]
05:26:39 UTC [RepositoryModule] Sync repository iso file to all nodes
05:26:39 UTC message: [master]
reset tmp dir failed: reset tmp dir failed: Failed to exec command: sudo -E /bin/bash -c "if [ -d /tmp/kubekey ]; then rm -rf /tmp/kubekey ;fi && mkdir -m 777 -p /tmp/kubekey"
当我尝试手动执行这些命令时,看到了很多的“无法删除”的提示,我尝试手动执行的命令是:
sudo -E /bin/bash -c "if [ -d /tmp/kubekey ]; then rm -rf /tmp/kubekey ;fi && mkdir -m 777 -p /tmp/kubekey"
在查看挂载点信息时我注意到:
/tmp/kubekey/ubuntu-22.04-amd64.iso (deleted) on /tmp/kubekey/iso type iso9660 (ro,relatime,nojoliet,check=s,map=n,blocksize=2048,iocharset=utf8)
最终解决这个报错我使用了
sudo umount /tmp/kubekey/iso
sudo rm -rf /tmp/kubekey
Relevant log output
This is a simple check of your environment.
Before installation, ensure that your machines meet all requirements specified at
https://github.com/kubesphere/kubekey#requirements-and-recommendations
Continue this installation? [yes/no]: yes
13:51:45 UTC success: [LocalHost]
13:51:45 UTC [UnArchiveArtifactModule] Check the KubeKey artifact md5 value
13:52:25 UTC success: [LocalHost]
13:52:25 UTC [UnArchiveArtifactModule] UnArchive the KubeKey artifact
13:52:25 UTC skipped: [LocalHost]
13:52:25 UTC [UnArchiveArtifactModule] Create the KubeKey artifact Md5 file
13:52:25 UTC skipped: [LocalHost]
13:52:25 UTC [RepositoryModule] Get OS release
13:52:25 UTC success: [master]
13:52:25 UTC success: [harbor]
13:52:25 UTC success: [node1]
13:52:25 UTC [RepositoryModule] Sync repository iso file to all nodes
13:52:30 UTC success: [node1]
13:52:30 UTC success: [master]
13:52:30 UTC success: [harbor]
13:52:30 UTC [RepositoryModule] Mount iso file
13:52:30 UTC success: [node1]
13:52:30 UTC success: [master]
13:52:30 UTC success: [harbor]
13:52:30 UTC [RepositoryModule] New repository client
13:52:30 UTC success: [node1]
13:52:30 UTC success: [master]
13:52:30 UTC success: [harbor]
13:52:30 UTC [RepositoryModule] Backup original repository
13:52:30 UTC message: [master]
backup repository failed: Failed to exec command: sudo -E /bin/bash -c "mv /etc/apt/sources.list /etc/apt/sources.list.kubekey.bak"
mv: cannot stat '/etc/apt/sources.list': No such file or directory: Process exited with status 1
13:52:31 UTC failed: [master]
13:52:31 UTC success: [node1]
13:52:31 UTC success: [harbor]
13:52:31 UTC rollback: [harbor]
13:52:31 UTC rollback: [master]
13:52:31 UTC rollback: [node1]
error: Pipeline[CreateClusterPipeline] execute failed: Module[RepositoryModule] exec failed:
failed: [master] [BackupOriginalRepository] exec failed after 1 retires: backup repository failed: Failed to exec command: sudo -E /bin/bash -c "mv /etc/apt/sources.list /etc/apt/sources.list.kubekey.bak"
mv: cannot stat '/etc/apt/sources.list': No such file or directory: Process exited with status 1
root@kube-offline-ctl:~#
Additional information
我要补充一个东西,我看了节点中的/etc/apt/sources.list,确实是不存在的,可能是由于之前的安装意外失败了导致的
没有之前的日志记录的话,很难知道是什么问题。 根据代码来推测的话,可能是这几个步骤中的某一个步骤出现了问题,然后没有正确卸载挂载的文件。 https://github.com/kubesphere/kubekey/blob/63d81438f50f88d9115aa677c7749a9d59e481f6/cmd/kk/pkg/bootstrap/os/module.go#L276-L340
非常感谢您提供的代码段,我看到了InstallPackage关键字,让我想起了另一个现象,当添加--with-packages时,会由执行脚本的节点在本地创建一个软件源,让其他节点从这个软件源来下载并安装软件,我注意到apt在最后的阶段卡死了,但是我并没有发现具体原因,看起来像(我在一台测试用的服务器上手动安装软件用来模拟当时的情景下发生的输出):
Setting up xxx ...
Processing triggers for xxx ...
Scanning processes...
Scanning linux images...
Running kernel seems to be up-to-date.
No services need to be restarted.
No containers need to be restarted.
No user sessions are running outdated binaries.
No VM guests are running outdated hypervisor (qemu) binaries on this host.
当最后一行出现后,并没有出现命令行的交互提示root@kube:~#让我输入命令,而是卡住在这里,我尝试一直等待下去,但是直到SSH断开也没有反应,由于我在screen中执行的命令,当我等待了一天后重新连接screen时,它依然卡在这里,如果选择使用Ctrl C终止进程,再次执行时会发生由于没有正确卸载导致的错误
解决方法是手动清理文件、卸载目录后不添加--with-packages参数安装
I encountered the same problem, when use --with-packages, it's Stuck.
Preparing to unpack .../ipvsadm_1.31-1build2_amd64.deb ...
Unpacking ipvsadm (1:1.31-1build2) ...........]
Setting up ipvsadm (1:1.31-1build2) ..........]
Setting up ebtables (2.0.11-4build2) .........]
Setting up libipset13:amd64 (7.15-1build1) ...]
Setting up ipset (7.15-1build1) ...######.....]
Processing triggers for man-db (2.12.0-4build2) ...
Processing triggers for libc-bin (2.39-0ubuntu8.4) ...
Scanning processes...
Scanning linux images...
Pending kernel upgrade!
Running kernel version:
6.8.0-57-generic
Diagnostics:
The currently running kernel version is not the
expected kernel version 6.8.0-58-generic.
Restarting the system to load the new kernel will
not be handled automatically, so you should
consider rebooting.
No services need to be restarted.
No containers need to be restarted.
No user sessions are running outdated binaries.
No VM guests are running outdated hypervisor
(qemu) binaries on this host.
I encountered the same problem, when use --with-packages, it's Stuck.
Preparing to unpack .../ipvsadm_1.31-1build2_amd64.deb ... Unpacking ipvsadm (1:1.31-1build2) ...........] Setting up ipvsadm (1:1.31-1build2) ..........] Setting up ebtables (2.0.11-4build2) .........] Setting up libipset13:amd64 (7.15-1build1) ...] Setting up ipset (7.15-1build1) ...######.....] Processing triggers for man-db (2.12.0-4build2) ... Processing triggers for libc-bin (2.39-0ubuntu8.4) ... Scanning processes... Scanning linux images... Pending kernel upgrade! Running kernel version: 6.8.0-57-generic Diagnostics: The currently running kernel version is not the expected kernel version 6.8.0-58-generic. Restarting the system to load the new kernel will not be handled automatically, so you should consider rebooting. No services need to be restarted. No containers need to be restarted. No user sessions are running outdated binaries. No VM guests are running outdated hypervisor (qemu) binaries on this host.
A temporary alternative is available. If you no longer wish to wait, terminate the process with the Ctrl + C keys, manually unmount the volume and clean up the leftovers as I mentioned, and then run it again without any parameters.
Oh, tks.
I think this may be related to 'Pending kernel upgrade'. But I can't verify it because after executing it in your way, the program can run normally, And when I reused the 'with packages' parameter, everything went smoothly.
solution: https://askubuntu.com/questions/1349884/how-to-disable-pending-kernel-upgrade-message