cloudpods
cloudpods copied to clipboard
[求助/Help]v3.11.2,计算节点离线,报:Host instance init error: Setup OVN Chassis: normalize db host: dns lookup (default-ovn-north) failed
一,版本: v3.11.2 部署了高可用。
二,有一台结算节点离线
1,POD信息
[root@master1 ~]# kubectl get pods -n onecloud -owide -w | grep node10
default-host-deployer-h7zff 0/1 CrashLoopBackOff 251 19h 172.16.1.234 node10 <none> <none>
default-host-health-t9vc5 0/1 CrashLoopBackOff 251 19h 172.16.1.234 node10 <none> <none>
default-host-image-gbgn5 0/1 CrashLoopBackOff 251 19h 172.16.1.234 node10 <none> <none>
default-host-xvztn 1/3 CrashLoopBackOff 494 19h 172.16.1.234 node10 <none> <none>
default-telegraf-f52sw 0/1 Init:CrashLoopBackOff 228 19h 172.16.1.234 node10 <none> <none>
2,host日志:
[root@master1 ~]# kubectl logs default-host-xvztn -n onecloud -c host
[info 240520 02:46:56 procutils.WaitZombieLoop(zombie_others.go:36)] My pid is not 1 and no need to wait zombies
[info 240520 02:46:56 options.parseOptions(options.go:334)] Use configuration file: /etc/yunion/host.conf
[info 240520 02:46:56 options.parseOptions(options.go:357)] Set log level to "info"
[info 2024-05-20 02:46:56 options.parseOptions(options.go:334)] Use configuration file: /etc/yunion/common/common.conf
[info 2024-05-20 02:46:56 options.parseOptions(options.go:357)] Set log level to "info"
[info 2024-05-20 02:46:56 hostman.(*SHostService).InitService(host_services.go:64)] exec socket path: /var/run/onecloud/exec.sock
[info 2024-05-20 02:46:56 app.InitApp(app.go:32)] RequestWorkerCount: 8
[info 2024-05-20 02:46:56 appsrv.NewApplication(appsrv.go:121)] App hostId: 4bhtR-oqKZELSL1qp4GCmt0ZpOM= (host,node10,172.16.1.234)
2024/05/20 02:46:56 Allow hosts []
[info 2024-05-20 02:46:56 appsrv.(*Application).SetDefaultTimeout(appsrv.go:137)] adjust application default timeout to 60.000000 seconds
[info 2024-05-20 02:46:56 hostinfo.DetectCpuInfo(hostinfohelper.go:78)] cpuinfo freq 2700
[info 2024-05-20 02:46:56 hostinfo.NewHostInfo(hostinfo.go:2446)] CPU Model Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz Microcode 0x2006e05
[info 2024-05-20 02:46:56 hostinfo.NewHostInfo(hostinfo.go:2466)] Get kubelet container image Fs: /opt/docker, eviction config: {"evictionHard":{"imagefs.available":{"Signal":"imagefs.available","Operator":"LessThan","Value":{"Quantity":null,"Percentage":0.05}},"memory.available":{"Signal":"memory.available","Operator":"LessThan","Value":{"Quantity":"100Mi","Percentage":0}},"nodefs.available":{"Signal":"nodefs.available","Operator":"LessThan","Value":{"Quantity":null,"Percentage":0.05}},"nodefs.inodesFree":{"Signal":"nodefs.inodesFree","Operator":"LessThan","Value":{"Quantity":null,"Percentage":0.05}}}}
[error 2024-05-20 02:46:59 fileutils2.GetAllBlkdevsIoSchedulers(fileutils.go:171)] no block device avaiable
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).prepareEnv(hostinfo.go:411)] I/O Scheduler switch to none
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).getKubeReservedMemMb(hostinfo.go:1572)] Kubelet memory threshold subtracted: 100MB
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).Init(hostinfo.go:196)] Start detectHostInfo
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectKVMMaxCpus(hostinfo.go:885)] KVM API VERSION 12
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectKVMMaxCpus(hostinfo.go:890)] KVM CAP MAX VCPUS: 288
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectKVMMaxCpus(hostinfo.go:898)] KVM CAP NR VCPUS: 240
[info 2024-05-20 02:46:59 sysutils.detectNestSupport(kvm.go:146)] Host is support kvm nest ...
[info 2024-05-20 02:46:59 sysutils.detectNestSupport(kvm.go:151)] Host kvm nest is enabled ...
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectOsDist(hostinfo.go:778)] DetectOsDist CentOS Linux 7.9.2009
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectQemuVersion(hostinfo.go:852)] Detect qemu version is 4.2.0
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectOvsVersion(hostinfo.go:993)] Detect OVS version is 2.12.4
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectOvsKOVersion(hostinfo.go:1010)] kernel module openvswitch vermagic: 5.4.130-1.yn20230805.el7.x86_64 SMP mod_unload modversions
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).Init(hostinfo.go:205)] Start parseConfig
[info 2024-05-20 02:46:59 hostinfo.NewNIC(hostinfohelper.go:241)] IP 172.16.1.234/br0/bond1
[info 2024-05-20 02:46:59 hostbridge.(*SBaseBridgeDriver).ConfirmToConfig(hostbridge.go:180)] bridge br0 already has ip 172.16.1.234
[info 2024-05-20 02:46:59 hostinfo.NewNIC(hostinfohelper.go:291)] Confirm to configuration!!
[info 2024-05-20 02:46:59 hostinfo.NewNIC(hostinfohelper.go:241)] IP 10.0.1.234/br1/bond0
[info 2024-05-20 02:46:59 hostbridge.(*SBaseBridgeDriver).ConfirmToConfig(hostbridge.go:180)] bridge br1 already has ip 10.0.1.234
[info 2024-05-20 02:46:59 hostinfo.NewNIC(hostinfohelper.go:291)] Confirm to configuration!!
[info 2024-05-20 02:46:59 hostinfo.(*SNIC).SetupDhcpRelay(hostinfohelper.go:203)] Not enable dhcp relay on nic: &hostinfo.SNIC{Inter:"bond1", Bridge:"br0", Ip:"172.16.1.234", Wire:"", WireId:"", Mask:24, Bandwidth:1000, BridgeDev:(*hostbridge.SOVSBridgeDriver)(0xc00151ec60), dhcpServer:(*hostdhcp.SGuestDHCPServer)(0xc00151f5f0)}
[info 2024-05-20 02:46:59 hostinfo.(*SNIC).SetupDhcpRelay(hostinfohelper.go:203)] Not enable dhcp relay on nic: &hostinfo.SNIC{Inter:"bond0", Bridge:"br1", Ip:"10.0.1.234", Wire:"", WireId:"", Mask:24, Bandwidth:1000, BridgeDev:(*hostbridge.SOVSBridgeDriver)(0xc0016e5590), dhcpServer:(*hostdhcp.SGuestDHCPServer)(0xc0016e5ec0)}
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).setupOvnChassis(hostinfo.go:223)] Start setting up ovn chassis
goroutine 1 [running]:
runtime/debug.Stack()
/usr/lib/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
/usr/lib/go/src/runtime/debug/stack.go:16 +0x19
yunion.io/x/onecloud/pkg/util/ovnutils.InitOvn.func1()
/root/go/src/yunion.io/x/onecloud/pkg/util/ovnutils/ovnutils.go:125 +0x3b
panic({0x2c24140, 0xc000b9e810})
/usr/lib/go/src/runtime/panic.go:838 +0x207
yunion.io/x/onecloud/pkg/util/ovnutils.mustPrepOvsdbConfig({{0xc0016b9b40, 0x1b}, {0xc0016b7fa8, 0x5}, {0x0, 0x0}, {0xc0016b7f80, 0xa}, 0x5dc, {0xc0016b7fd0, ...}, ...})
/root/go/src/yunion.io/x/onecloud/pkg/util/ovnutils/ovnutils.go:93 +0x645
yunion.io/x/onecloud/pkg/util/ovnutils.InitOvn({{0xc0016b9b40, 0x1b}, {0xc0016b7fa8, 0x5}, {0x0, 0x0}, {0xc0016b7f80, 0xa}, 0x5dc, {0xc0016b7fd0, ...}, ...})
/root/go/src/yunion.io/x/onecloud/pkg/util/ovnutils/ovnutils.go:130 +0xb8
yunion.io/x/onecloud/pkg/hostman/hostinfo.(*OvnHelper).Init(...)
/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostovn.go:41
yunion.io/x/onecloud/pkg/hostman/hostinfo.(*SHostInfo).setupOvnChassis(0xc000e82000?)
/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostinfo.go:225 +0xb8
yunion.io/x/onecloud/pkg/hostman/hostinfo.(*SHostInfo).Init(0x5674ad0?)
/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostinfo.go:210 +0xdc
yunion.io/x/onecloud/pkg/hostman.(*SHostService).RunService(0xc000010160?)
/root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:80 +0x6f
yunion.io/x/onecloud/pkg/cloudcommon/service.(*SServiceBase).StartService(0xc00000e108)
/root/go/src/yunion.io/x/onecloud/pkg/cloudcommon/service/services.go:58 +0xe4
yunion.io/x/onecloud/pkg/hostman.StartService(...)
/root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:163
main.main()
/root/go/src/yunion.io/x/onecloud/cmd/host/main.go:30 +0x10a
goroutine 1 [running]:
runtime/debug.Stack()
/usr/lib/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
/usr/lib/go/src/runtime/debug/stack.go:16 +0x19
yunion.io/x/log.Fatalf({0x30fd118, 0x1c}, {0xc0016dfea8, 0x1, 0x1})
/root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/log/log.go:138 +0x32
yunion.io/x/onecloud/pkg/hostman.(*SHostService).RunService(0xc000010160?)
/root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:81 +0xb4
yunion.io/x/onecloud/pkg/cloudcommon/service.(*SServiceBase).StartService(0xc00000e108)
/root/go/src/yunion.io/x/onecloud/pkg/cloudcommon/service/services.go:58 +0xe4
yunion.io/x/onecloud/pkg/hostman.StartService(...)
/root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:163
main.main()
/root/go/src/yunion.io/x/onecloud/cmd/host/main.go:30 +0x10a
[fatal 2024-05-20 02:46:59 hostman.(*SHostService).RunService(host_services.go:81)] Host instance init error: Setup OVN Chassis: normalize db host: dns lookup (default-ovn-north) failed: lookup default-ovn-north on 10.96.0.10:53: no such host
3,计算节点上ipconfig信息:
[root@node10 ~]# ifconfig
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 1500
ether 9c:74:1a:c1:89:46 txqueuelen 1000 (Ethernet)
RX packets 5903 bytes 868402 (848.0 KiB)
RX errors 0 dropped 6 overruns 0 frame 0
TX packets 43 bytes 2870 (2.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
bond1: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 1500
ether 04:42:1a:cb:4b:6a txqueuelen 1000 (Ethernet)
RX packets 20225 bytes 5402646 (5.1 MiB)
RX errors 0 dropped 6 overruns 0 frame 0
TX packets 11854 bytes 1268500 (1.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.16.1.234 netmask 255.255.255.0 broadcast 172.16.1.255
inet6 fe80::642:1aff:fecb:4b6a prefixlen 64 scopeid 0x20<link>
ether 04:42:1a:cb:4b:6a txqueuelen 1000 (Ethernet)
RX packets 14997 bytes 4428205 (4.2 MiB)
RX errors 0 dropped 249 overruns 0 frame 0
TX packets 10986 bytes 1159598 (1.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
br1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.1.234 netmask 255.255.255.0 broadcast 10.0.1.255
inet6 fe80::9e74:1aff:fec1:8946 prefixlen 64 scopeid 0x20<link>
ether 9c:74:1a:c1:89:46 txqueuelen 1000 (Ethernet)
RX packets 5361 bytes 724815 (707.8 KiB)
RX errors 0 dropped 289 overruns 0 frame 0
TX packets 19 bytes 1282 (1.2 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eno1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 1500
ether 04:42:1a:cb:4b:6a txqueuelen 1000 (Ethernet)
RX packets 2969 bytes 178698 (174.5 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0xd3620000-d363ffff
eno2: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 1500
ether 04:42:1a:cb:4b:6a txqueuelen 1000 (Ethernet)
RX packets 17265 bytes 5224750 (4.9 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 11868 bytes 1272200 (1.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0xd3600000-d361ffff
enp28s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 1500
ether 9c:74:1a:c1:89:46 txqueuelen 1000 (Ethernet)
RX packets 1332 bytes 330415 (322.6 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 22 bytes 1428 (1.3 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp28s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 1500
ether 9c:74:1a:c1:89:46 txqueuelen 1000 (Ethernet)
RX packets 4571 bytes 537987 (525.3 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 21 bytes 1442 (1.4 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
genev_sys_6081: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 65000
inet6 fe80::ec61:f8ff:fe76:a380 prefixlen 64 scopeid 0x20<link>
ether ee:61:f8:76:a3:80 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 13 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 306 bytes 18416 (17.9 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 306 bytes 18416 (17.9 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
4,host.conf的网络信息:
ovn_encap_ip: 10.0.1.234
networks:
- bond1/br0/172.16.1.234
- bond0/br1/10.0.1.234
没改动内容情况下,重启该计算节点就报错了。、
请求解决思路,排查问题点,谢谢!!!