[求助/Help]Update onecloud user admin password to admin@123 retry
系统:Ubuntu22.04
私有云部署oncloud v 3.11
3台物理机高可用集群部署 遇到下面报错,一直在重试retry
TASK [primary-master-node/setup_cloud : Update onecloud user admin password to admin@123] ***
FAILED - RETRYING: [10.64.25.8]: Update onecloud user admin password to admin@123 (3 retries left).
FAILED - RETRYING: [10.64.25.8]: Update onecloud user admin password to admin@123 (2 retries left).
FAILED - RETRYING: [10.64.25.8]: Update onecloud user admin password to admin@123 (1 retries left).
fatal: [10.64.25.8]: FAILED! => {"attempts": 3, "changed": true, "cmd": "source ~/.onecloud_rcadmin\n/opt/yunion/bin/climc user-update --password admin@123 --enabled --allow-web-console admin\n", "delta": "0:00:00.677680", "end": "2024-06-05 17:51:04.370133", "msg": "non-zero return code", "rc": 1, "start": "2024-06-05 17:51:03.692453", "stderr": "[warning 240605 17:51:04 mcclient.(*Client).unmarshalV3Token(mcclient.go:288)] No service catalog avaiable\n[warning 240605 17:51:04 mcclient.(Client).NewSession(mcclient.go:450)] Missing service catalog in token\n{"error":{"class":"ForbiddenError","code":403,"details":"ListItems: listItemQueryFiltersRaw: {\"error\":{\"class\":\"ForbiddenError\",\"code\":403,\"data\":{\"fields\":[\"identity\",\"users\",\"list\",\"domain\",\"project\",\"domain\"],\"id\":\"not enough privilege to do %!s(MISSING):%!s(MISSING):%!s(MISSING) (require:%!s(MISSING),allow:%!s(MISSING),query:%!s(MISSING))\"},\"details\":\"not enough privilege to do identity:users:list (require:domain,allow:project,query:domain)\"}}","request":{"headers":{"User-Agent":"yunioncloud-go/201708","X-Auth-Token":""},"method":"GET","url":"https://10.64.25.236:30500/v3/users?name=admin"}}}", "stderr_lines": ["[warning 240605 17:51:04 mcclient.(*Client).unmarshalV3Token(mcclient.go:288)] No service catalog avaiable", "[warning 240605 17:51:04 mcclient.(Client).NewSession(mcclient.go:450)] Missing service catalog in token", "{"error":{"class":"ForbiddenError","code":403,"details":"ListItems: listItemQueryFiltersRaw: {\"error\":{\"class\":\"ForbiddenError\",\"code\":403,\"data\":{\"fields\":[\"identity\",\"users\",\"list\",\"domain\",\"project\",\"domain\"],\"id\":\"not enough privilege to do %!s(MISSING):%!s(MISSING):%!s(MISSING) (require:%!s(MISSING),allow:%!s(MISSING),query:%!s(MISSING))\"},\"details\":\"not enough privilege to do identity:users:list (require:domain,allow:project,query:domain)\"}}","request":{"headers":{"User-Agent":"yunioncloud-go/201708","X-Auth-Token":""},"method":"GET","url":"https://10.64.25.236:30500/v3/users?name=admin"}}}"], "stdout": "", "stdout_lines": []}
TASK [primary-master-node/setup_cloud : Create onecloud web login user admin] ***
新开ssh session,执行如下命令,看一下各pods是否都正常?
# as root
source ~/.bashrc
kubectl get pods -o wide -A
ubuntu 下,etcd 有不正常的情况,正在优化中。目前可以重启操作系统,然后重跑一遍ocboot 部署流程,看一下是否能够过去。
按照上面的提示进行操作,TASK [primary-master-node/setup_cloud : Create onecloud web login user admin] ***这一步是过去了,部署流程走完了,但是以下pod还是Init状态
PLAY RECAP *********************************************************************
10.64.25.x : ok=124 changed=21 unreachable=0 failed=0 skipped=76 rescued=0 ignored=0
10.64.25.x : ok=163 changed=40 unreachable=0 failed=0 skipped=61 rescued=0 ignored=1
10.64.25.x : ok=121 changed=20 unreachable=0 failed=0 skipped=73 rescued=0 ignored=0
Initialized successfully! Web page: https://10.64.25.x/ User: admin Password: admin@123
web界面登录 404 not found
按照上面的提示进行操作,TASK [primary-master-node/setup_cloud : Create onecloud web login user admin] ***这一步是过去了,部署流程走完了,但是以下pod还是Init状态
PLAY RECAP ********************************************************************* 10.64.25.x : ok=124 changed=21 unreachable=0 failed=0 skipped=76 rescued=0 ignored=0 10.64.25.x : ok=163 changed=40 unreachable=0 failed=0 skipped=61 rescued=0 ignored=1 10.64.25.x : ok=121 changed=20 unreachable=0 failed=0 skipped=73 rescued=0 ignored=0
Initialized successfully! Web page: https://10.64.25.x/ User: admin Password: admin@123
web界面登录 404 not found
等所有的pods状态状态都是Running后,再试下
猜测应该是某种报错卡住了,昨天的部署完的时候 ,pod就是上图中Init状态了
猜测应该是某种报错卡住了,昨天的部署完的时候 ,pod就是上图中Init状态了
看下keystone pods的日志有error信息没
default-keystone-58464f569b-59b9w日志在这 ,另一个keystone pod还在Init
keystone.log
补充一些其他的信息:
kubectl get deployments -A
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE kube-system calico-kube-controllers 1/1 1 1 2d2h kube-system coredns 2/2 2 2 2d2h kube-system metrics-server 1/1 1 1 2d2h local-path-storage local-path-provisioner 1/1 1 1 2d2h onecloud default-keystone 1/1 1 1 2d1h onecloud default-region 0/1 1 0 30h onecloud default-victoria-metrics 1/1 1 1 30h onecloud onecloud-operator 1/1 1 1 2d2h
kubectl get pods -n onecloud
NAME READY STATUS RESTARTS AGE default-etcd-77crdfvz27 1/1 Running 0 30h default-etcd-hcv26d5ml6 1/1 Running 0 30h default-etcd-vklbctctmk 1/1 Running 0 29h default-keystone-58464f569b-59b9w 1/1 Running 1 2d1h default-keystone-7464657644-vn9zd 0/1 Init:ErrImagePull 0 3m29s default-region-7b68578b99-hcnj8 0/1 Init:ImagePullBackOff 0 29h default-region-876ff7c5c-fxnn7 0/1 Init:ImagePullBackOff 0 2m16s default-telegraf-cfgbx 0/1 Init:CrashLoopBackOff 367 30h default-telegraf-tspl6 0/1 Init:CrashLoopBackOff 364 30h default-telegraf-zxbdz 0/1 Init:CrashLoopBackOff 364 30h default-victoria-metrics-679fb79fdc-6vzq2 1/1 Running 0 2m44s onecloud-operator-f9644ff86-j7vkh 1/1 Running 0 29h
kubectl describe pod/default-region-876ff7c5c-fxnn7 -n onecloud
Name: default-region-876ff7c5c-fxnn7
Namespace: onecloud
Priority: 0
Node: cpu-a1203-node010/10.64.25.10
Start Time: Fri, 07 Jun 2024 17:34:07 +0800
Labels: app=region
app.kubernetes.io/component=region
app.kubernetes.io/instance=onecloud-cluster-njzx
app.kubernetes.io/managed-by=onecloud-operator
app.kubernetes.io/name=onecloud-cluster
pod-template-hash=876ff7c5c
Annotations: cni.projectcalico.org/podIP: 10.40.56.136/32
cni.projectcalico.org/podIPs: 10.40.56.136/32
kubectl.kubernetes.io/restartedAt: 2024-06-07T17:34:07+08:00
onecloud.yunion.io/last-applied-configuration:
{"volumes":[{"name":"certs","secret":{"secretName":"default-certs","items":[{"key":"ca.crt","path":"ca.crt"},{"key":"service.crt","path":"...
Status: Pending
IP: 10.40.56.136
Controlled By: ReplicaSet/default-region-876ff7c5c
Init Containers:
init:
Container ID:
Image: registry.cn-beijing.aliyuncs.com/yunion/region:v3.1.5-20200514.1
Image ID:
Port:
Image: registry.cn-beijing.aliyuncs.com/yunion/region:v3.1.5-20200514.1
Image ID:
Port: 30888/TCP
Host Port: 0/TCP
Command:
/opt/yunion/bin/region
--config
/etc/yunion/region.conf
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
cpu: 42667m
memory: 135158Mi
Requests:
cpu: 10m
memory: 10Mi
Readiness: http-get https://:30888/ping delay=30s timeout=5s period=15s #success=3 #failure=3
Environment:
Normal Scheduled 2m48s default-scheduler Successfully assigned onecloud/default-region-876ff7c5c-fxnn7 to sym206-cpu-a1203-node010 Normal Pulling 82s (x4 over 2m48s) kubelet, cpu-a1203-node010 Pulling image "registry.cn-beijing.aliyuncs.com/yunion/region:v3.1.5-20200514.1" Warning Failed 81s (x4 over 2m47s) kubelet, cpu-a1203-node010 Failed to pull image "registry.cn-beijing.aliyuncs.com/yunion/region:v3.1.5-20200514.1": rpc error: code = Unknown desc = Error response from daemon: manifest for registry.cn-beijing.aliyuncs.com/yunion/region:v3.1.5-20200514.1 not found: manifest unknown: manifest unknown Warning Failed 81s (x4 over 2m47s) kubelet, cpu-a1203-node010 Error: ErrImagePull Warning Failed 54s (x6 over 2m46s) kubelet, cpu-a1203-node010 Error: ImagePullBackOff Normal BackOff 39s (x7 over 2m46s) kubelet, cpu-a1203-node010 Back-off pulling image "registry.cn-beijing.aliyuncs.com/yunion/region:v3.1.5-20200514.1" 拉取镜像这块是有报错的 Failed to pull image "[registry.cn-beijing.aliyuncs.com/yunion/region:v3.1.5-20200514.1]
您升级的目标版本,是标准版(v3.11.4),还是日常开发版?(例如v3.1.5-20200514.1,这种带着日期标签的)
标准版 registry:registry.cn-beijing.aliyuncs.com/yunion/region
开发版 registry:registry.cn-beijing.aliyuncs.com/yunion**io**/region
v3.1.5-20200514.1 这个镜像只存在于开发版。应该是 registry.cn-beijing.aliyuncs.com/yunionio/region:v3.1.5-20200514.1
升级的话,从标准版本(例如v3.11.3, v3.11.4),升级为开发版, 需要指定registry:
python3 ocboot.py upgrade --image-repository registry.cn-beijing.aliyuncs.com/yunionio <其他参数>
升级的话,从开发日常版本升级到标准版,则无需指定registry。因为标准版的tag 会在开发版、日常版之间同步。
感谢回复,目前高可用集群已经部署起来了,问题原因就是执行升级出了问题。 目前集群个别pod状态异常,host和 telegraf
kubectl get pods -A -n onecloud | grep "Crash"
onecloud default-host-9qpvd 2/3 CrashLoopBackOff 953 3d15h onecloud default-telegraf-8tm7l 0/1 Init:CrashLoopBackOff 1033 3d15h onecloud default-telegraf-9p28n 0/1 Init:CrashLoopBackOff 1032 3d15h onecloud default-telegraf-f5jn7 0/1 Init:CrashLoopBackOff 1033
感谢回复,目前高可用集群已经部署起来了,问题原因就是执行升级出了问题。 目前集群个别pod状态异常,host和 telegraf
kubectl get pods -A -n onecloud | grep "Crash"
onecloud default-host-9qpvd 2/3 CrashLoopBackOff 953 3d15h onecloud default-telegraf-8tm7l 0/1 Init:CrashLoopBackOff 1033 3d15h onecloud default-telegraf-9p28n 0/1 Init:CrashLoopBackOff 1032 3d15h onecloud default-telegraf-f5jn7 0/1 Init:CrashLoopBackOff 1033
我在部署的时候遇到admin认证问题,重启重新部署后,还是会报错,但是ansible跑完了,提示集群的登录信息也出现了。 但是我在kubectl get pods -n onecloud 时发现有几个pod状态都是crash,尝试删除重新拉起也起不来 default-telegraf-8tm7l 0/1 Init:CrashLoopBackOff 1112 3d22h default-telegraf-9p28n 0/1 Init:CrashLoopBackOff 1111 3d22h default-telegraf-f5jn7 0/1 Init:Error 1112 3d22h default-host-mpg4d 2/3 Running 16 23m default-host-twkt7 1/3 CrashLoopBackOff 24 51m default-host-v9xw7 1/3 CrashLoopBackOff 24 53m
查了下log,
[error 2024-06-11 08:27:35 hostinfo.(*SHostInfo).prepareEnv(hostinfo.go:380)] tuned-adm profile virtual-host fail: exec: "tuned-adm": executable file not found in $PATH
[error 2024-06-11 08:27:35 fileutils2.GetAllBlkdevsIoSchedulers(fileutils.go:171)] no block device avaiable
[info 2024-06-11 08:27:35 hostinfo.(*SHostInfo).prepareEnv(hostinfo.go:411)] I/O Scheduler switch to none
[info 2024-06-11 08:27:35 hostinfo.(*SHostInfo).getKubeReservedMemMb(hostinfo.go:1572)] Kubelet memory threshold subtracted: 100MB
[info 2024-06-11 08:27:35 hostinfo.(*SHostInfo).Init(hostinfo.go:196)] Start detectHostInfo
[info 2024-06-11 08:27:35 hostinfo.(*SHostInfo).detectKVMMaxCpus(hostinfo.go:885)] KVM API VERSION 12
[info 2024-06-11 08:27:35 hostinfo.(*SHostInfo).detectKVMMaxCpus(hostinfo.go:890)] KVM CAP MAX VCPUS: 1024
[info 2024-06-11 08:27:35 hostinfo.(*SHostInfo).detectKVMMaxCpus(hostinfo.go:898)] KVM CAP NR VCPUS: 710
[info 2024-06-11 08:27:35 sysutils.detectNestSupport(kvm.go:146)] Host is support kvm nest ...
[info 2024-06-11 08:27:35 sysutils.detectNestSupport(kvm.go:151)] Host kvm nest is enabled ...
[error 2024-06-11 08:27:35 hostinfo.(*SHostInfo).detectOsDist(hostinfo.go:768)] exit status 1
[info 2024-06-11 08:27:35 hostinfo.(*SHostInfo).detectOsDist(hostinfo.go:778)] DetectOsDist
[error 2024-06-11 08:27:35 hostinfo.(*SHostInfo).detectOsDist(hostinfo.go:780)] Failed to detect distribution info
[warning 2024-06-11 08:27:36 hostinfo.(*SHostInfo).detectOsDist(hostinfo.go:799)] system_service.SetOpenvswitchName to openvswitch-switch
[info 2024-06-11 08:27:36 hostinfo.(*SHostInfo).detectQemuVersion(hostinfo.go:852)] Detect qemu version is 4.2.0
[info 2024-06-11 08:27:36 hostinfo.(*SHostInfo).detectOvsVersion(hostinfo.go:993)] Detect OVS version is 2.12.4
[info 2024-06-11 08:27:36 hostinfo.(*SHostInfo).detectOvsKOVersion(hostinfo.go:1010)] kernel module openvswitch vermagic: 5.15.0-107-generic S
MP mod_unload modversions
[info 2024-06-11 08:27:36 hostinfo.(*SHostInfo).Init(hostinfo.go:205)] Start parseConfig
[info 2024-06-11 08:27:36 hostinfo.NewNIC(hostinfohelper.go:241)] IP 10.64.25.9/br0/bond0
[info 2024-06-11 08:27:36 hostbridge.(*SBaseBridgeDriver).ConfirmToConfig(hostbridge.go:180)] bridge br0 already has ip 10.64.25.9
[info 2024-06-11 08:27:36 hostinfo.NewNIC(hostinfohelper.go:291)] Confirm to configuration!!
[info 2024-06-11 08:27:36 hostinfo.(*SNIC).SetupDhcpRelay(hostinfohelper.go:203)] Not enable dhcp relay on nic: &hostinfo.SNIC{Inter:"bond0", Bridge
:"br0", Ip:"10.64.25.9", Wire:"", WireId:"", Mask:24, Bandwidth:1000, BridgeDev:(*hostbridge.SOVSBridgeDriver)(0xc00220d860), dhcpServer:(*hostdhcp.
SGuestDHCPServer)(0xc001cf4720)}
[info 2024-06-11 08:27:36 hostinfo.(*SHostInfo).setupOvnChassis(hostinfo.go:223)] Start setting up ovn chassis
[error 2024-06-11 08:27:36 auth.(*authManager).startRefreshRevokeTokens(auth.go:193)] refreshRevokeTokens: No valid admin token credential
结合社区其他文档指导安装sudo apt install tuned tuned-utils tuned-utils-systemtap
后无tuned的报错,但还是出现认证错误
[error 2024-06-11 08:41:11 fileutils2.GetAllBlkdevsIoSchedulers(fileutils.go:171)] no block device avaiable
[info 2024-06-11 08:41:11 hostinfo.(*SHostInfo).prepareEnv(hostinfo.go:411)] I/O Scheduler switch to none
[info 2024-06-11 08:41:11 hostinfo.(*SHostInfo).getKubeReservedMemMb(hostinfo.go:1572)] Kubelet memory threshold subtracted: 100MB
[info 2024-06-11 08:41:11 hostinfo.(*SHostInfo).Init(hostinfo.go:196)] Start detectHostInfo
[info 2024-06-11 08:41:11 hostinfo.(*SHostInfo).detectKVMMaxCpus(hostinfo.go:885)] KVM API VERSION 12
[info 2024-06-11 08:41:11 hostinfo.(*SHostInfo).detectKVMMaxCpus(hostinfo.go:890)] KVM CAP MAX VCPUS: 1024
[info 2024-06-11 08:41:11 hostinfo.(*SHostInfo).detectKVMMaxCpus(hostinfo.go:898)] KVM CAP NR VCPUS: 710
[info 2024-06-11 08:41:11 sysutils.detectNestSupport(kvm.go:146)] Host is support kvm nest ...
[info 2024-06-11 08:41:11 sysutils.detectNestSupport(kvm.go:151)] Host kvm nest is enabled ...
[error 2024-06-11 08:41:11 hostinfo.(*SHostInfo).detectOsDist(hostinfo.go:768)] exit status 1
[info 2024-06-11 08:41:11 hostinfo.(*SHostInfo).detectOsDist(hostinfo.go:778)] DetectOsDist
[error 2024-06-11 08:41:11 hostinfo.(*SHostInfo).detectOsDist(hostinfo.go:780)] Failed to detect distribution info
[warning 2024-06-11 08:41:11 hostinfo.(*SHostInfo).detectOsDist(hostinfo.go:799)] system_service.SetOpenvswitchName to openvswitch-switch
[info 2024-06-11 08:41:11 hostinfo.(*SHostInfo).detectQemuVersion(hostinfo.go:852)] Detect qemu version is 4.2.0
[info 2024-06-11 08:41:11 hostinfo.(*SHostInfo).detectOvsVersion(hostinfo.go:993)] Detect OVS version is 2.12.4
[info 2024-06-11 08:41:11 hostinfo.(*SHostInfo).detectOvsKOVersion(hostinfo.go:1010)] kernel module openvswitch vermagic: 5.15.0-107-generic S
MP mod_unload modversions
[info 2024-06-11 08:41:12 hostinfo.(*SHostInfo).Init(hostinfo.go:205)] Start parseConfig
[info 2024-06-11 08:41:12 hostinfo.NewNIC(hostinfohelper.go:241)] IP 10.64.25.8/br0/bond0
[info 2024-06-11 08:41:12 hostbridge.(*SBaseBridgeDriver).ConfirmToConfig(hostbridge.go:180)] bridge br0 already has ip 10.64.25.8
[info 2024-06-11 08:41:12 hostinfo.NewNIC(hostinfohelper.go:291)] Confirm to configuration!!
[info 2024-06-11 08:41:12 hostinfo.(*SNIC).SetupDhcpRelay(hostinfohelper.go:203)] Not enable dhcp relay on nic: &hostinfo.SNIC{Inter:"bond0", Bridge:"br0", Ip:"10.64.25.8", Wire:"", WireId:"", Mask:24, Bandwidth:1000, BridgeDev:(*hostbridge.SOVSBridgeDriver)(0xc001313770), dhcpServer:(*hostdhcp.SGuestDHCPServer)(0xc00166e2d0)}
[info 2024-06-11 08:41:12 hostinfo.(*SHostInfo).setupOvnChassis(hostinfo.go:223)] Start setting up ovn chassis
[error 2024-06-11 08:41:12 auth.(*authManager).startRefreshRevokeTokens(auth.go:193)] refreshRevokeTokens: No valid admin token credential
[info 2024-06-11 08:42:12 ovnutils.configBridgeMtu.func1(ovnutils.go:42)] set brvpc MTU to 1500 success!
[error 2024-06-11 08:42:12 auth.(*authManager).authAdmin(auth.go:269)] Admin auth failed: {"error":{"class":"TimeoutError","code":504,"details":"request process timeout","request":{"body":"{"auth":{"context":{"source":"srv"},"identity":{"methods":["password"],"password":{"user":{"domain":...ult"},"name":"system"}}}}","headers":{"Content-Length":"238","Content-Type":"application/json","User-Agent":"yunioncloud-go/201
708","X-Yunion-Parent-Id":"","X-Yunion-Peer-Service-Name":"host","X-Yunion-Remote-Addr":"default-keystone:30357","X-Yunion-Span-Id":"0","X-Yunion-Sp
an-Name":"","X-Yunion-Strace-Debug":"true","X-Yunion-Strace-Id":"c8f3f4ea"},"method":"POST","url":"https://default-keystone:30357/v3/auth/tokens"}}}
goroutine 1 [running]:
runtime/debug.Stack()
/usr/lib/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
If you do not provide feedback for more than 37 days, we will close the issue and you can either reopen it or submit a new issue.
您超过 37 天未反馈信息,我们将关闭该 issue,如有需求您可以重新打开或者提交新的 issue。
