cloudpods icon indicating copy to clipboard operation
cloudpods copied to clipboard

[求助/Help]v3.11.2,计算节点离线,报:Host instance init error: Setup OVN Chassis: normalize db host: dns lookup (default-ovn-north) failed

Open chenjacken opened this issue 1 year ago • 0 comments

一,版本: v3.11.2 部署了高可用。

二,有一台结算节点离线

1,POD信息

[root@master1 ~]# kubectl get pods -n onecloud -owide -w | grep node10
default-host-deployer-h7zff                          0/1     CrashLoopBackOff        251        19h   172.16.1.234    node10     <none>           <none>
default-host-health-t9vc5                            0/1     CrashLoopBackOff        251        19h   172.16.1.234    node10     <none>           <none>
default-host-image-gbgn5                             0/1     CrashLoopBackOff        251        19h   172.16.1.234    node10     <none>           <none>
default-host-xvztn                                   1/3     CrashLoopBackOff        494        19h   172.16.1.234    node10     <none>           <none>
default-telegraf-f52sw                               0/1     Init:CrashLoopBackOff   228        19h   172.16.1.234    node10     <none>           <none>

2,host日志:

[root@master1 ~]# kubectl logs default-host-xvztn -n onecloud -c host
[info 240520 02:46:56 procutils.WaitZombieLoop(zombie_others.go:36)] My pid is not 1 and no need to wait zombies
[info 240520 02:46:56 options.parseOptions(options.go:334)] Use configuration file: /etc/yunion/host.conf
[info 240520 02:46:56 options.parseOptions(options.go:357)] Set log level to "info"
[info 2024-05-20 02:46:56 options.parseOptions(options.go:334)] Use configuration file: /etc/yunion/common/common.conf
[info 2024-05-20 02:46:56 options.parseOptions(options.go:357)] Set log level to "info"
[info 2024-05-20 02:46:56 hostman.(*SHostService).InitService(host_services.go:64)] exec socket path: /var/run/onecloud/exec.sock
[info 2024-05-20 02:46:56 app.InitApp(app.go:32)] RequestWorkerCount: 8
[info 2024-05-20 02:46:56 appsrv.NewApplication(appsrv.go:121)] App hostId: 4bhtR-oqKZELSL1qp4GCmt0ZpOM= (host,node10,172.16.1.234)
2024/05/20 02:46:56 Allow hosts []
[info 2024-05-20 02:46:56 appsrv.(*Application).SetDefaultTimeout(appsrv.go:137)] adjust application default timeout to 60.000000 seconds
[info 2024-05-20 02:46:56 hostinfo.DetectCpuInfo(hostinfohelper.go:78)] cpuinfo freq 2700
[info 2024-05-20 02:46:56 hostinfo.NewHostInfo(hostinfo.go:2446)] CPU Model Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz Microcode 0x2006e05
[info 2024-05-20 02:46:56 hostinfo.NewHostInfo(hostinfo.go:2466)] Get kubelet container image Fs: /opt/docker, eviction config: {"evictionHard":{"imagefs.available":{"Signal":"imagefs.available","Operator":"LessThan","Value":{"Quantity":null,"Percentage":0.05}},"memory.available":{"Signal":"memory.available","Operator":"LessThan","Value":{"Quantity":"100Mi","Percentage":0}},"nodefs.available":{"Signal":"nodefs.available","Operator":"LessThan","Value":{"Quantity":null,"Percentage":0.05}},"nodefs.inodesFree":{"Signal":"nodefs.inodesFree","Operator":"LessThan","Value":{"Quantity":null,"Percentage":0.05}}}}
[error 2024-05-20 02:46:59 fileutils2.GetAllBlkdevsIoSchedulers(fileutils.go:171)] no block device avaiable
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).prepareEnv(hostinfo.go:411)] I/O Scheduler switch to none
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).getKubeReservedMemMb(hostinfo.go:1572)] Kubelet memory threshold subtracted: 100MB
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).Init(hostinfo.go:196)] Start detectHostInfo
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectKVMMaxCpus(hostinfo.go:885)] KVM API VERSION 12
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectKVMMaxCpus(hostinfo.go:890)] KVM CAP MAX VCPUS: 288
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectKVMMaxCpus(hostinfo.go:898)] KVM CAP NR VCPUS: 240
[info 2024-05-20 02:46:59 sysutils.detectNestSupport(kvm.go:146)] Host is support kvm nest ...
[info 2024-05-20 02:46:59 sysutils.detectNestSupport(kvm.go:151)] Host kvm nest is enabled ...
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectOsDist(hostinfo.go:778)] DetectOsDist CentOS Linux 7.9.2009
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectQemuVersion(hostinfo.go:852)] Detect qemu version is 4.2.0
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectOvsVersion(hostinfo.go:993)] Detect OVS version is 2.12.4
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectOvsKOVersion(hostinfo.go:1010)] kernel module openvswitch vermagic:       5.4.130-1.yn20230805.el7.x86_64 SMP mod_unload modversions 
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).Init(hostinfo.go:205)] Start parseConfig
[info 2024-05-20 02:46:59 hostinfo.NewNIC(hostinfohelper.go:241)] IP 172.16.1.234/br0/bond1
[info 2024-05-20 02:46:59 hostbridge.(*SBaseBridgeDriver).ConfirmToConfig(hostbridge.go:180)] bridge br0 already has ip 172.16.1.234
[info 2024-05-20 02:46:59 hostinfo.NewNIC(hostinfohelper.go:291)] Confirm to configuration!!
[info 2024-05-20 02:46:59 hostinfo.NewNIC(hostinfohelper.go:241)] IP 10.0.1.234/br1/bond0
[info 2024-05-20 02:46:59 hostbridge.(*SBaseBridgeDriver).ConfirmToConfig(hostbridge.go:180)] bridge br1 already has ip 10.0.1.234
[info 2024-05-20 02:46:59 hostinfo.NewNIC(hostinfohelper.go:291)] Confirm to configuration!!
[info 2024-05-20 02:46:59 hostinfo.(*SNIC).SetupDhcpRelay(hostinfohelper.go:203)] Not enable dhcp relay on nic: &hostinfo.SNIC{Inter:"bond1", Bridge:"br0", Ip:"172.16.1.234", Wire:"", WireId:"", Mask:24, Bandwidth:1000, BridgeDev:(*hostbridge.SOVSBridgeDriver)(0xc00151ec60), dhcpServer:(*hostdhcp.SGuestDHCPServer)(0xc00151f5f0)}
[info 2024-05-20 02:46:59 hostinfo.(*SNIC).SetupDhcpRelay(hostinfohelper.go:203)] Not enable dhcp relay on nic: &hostinfo.SNIC{Inter:"bond0", Bridge:"br1", Ip:"10.0.1.234", Wire:"", WireId:"", Mask:24, Bandwidth:1000, BridgeDev:(*hostbridge.SOVSBridgeDriver)(0xc0016e5590), dhcpServer:(*hostdhcp.SGuestDHCPServer)(0xc0016e5ec0)}
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).setupOvnChassis(hostinfo.go:223)] Start setting up ovn chassis
goroutine 1 [running]:
runtime/debug.Stack()
        /usr/lib/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
        /usr/lib/go/src/runtime/debug/stack.go:16 +0x19
yunion.io/x/onecloud/pkg/util/ovnutils.InitOvn.func1()
        /root/go/src/yunion.io/x/onecloud/pkg/util/ovnutils/ovnutils.go:125 +0x3b
panic({0x2c24140, 0xc000b9e810})
        /usr/lib/go/src/runtime/panic.go:838 +0x207
yunion.io/x/onecloud/pkg/util/ovnutils.mustPrepOvsdbConfig({{0xc0016b9b40, 0x1b}, {0xc0016b7fa8, 0x5}, {0x0, 0x0}, {0xc0016b7f80, 0xa}, 0x5dc, {0xc0016b7fd0, ...}, ...})
        /root/go/src/yunion.io/x/onecloud/pkg/util/ovnutils/ovnutils.go:93 +0x645
yunion.io/x/onecloud/pkg/util/ovnutils.InitOvn({{0xc0016b9b40, 0x1b}, {0xc0016b7fa8, 0x5}, {0x0, 0x0}, {0xc0016b7f80, 0xa}, 0x5dc, {0xc0016b7fd0, ...}, ...})
        /root/go/src/yunion.io/x/onecloud/pkg/util/ovnutils/ovnutils.go:130 +0xb8
yunion.io/x/onecloud/pkg/hostman/hostinfo.(*OvnHelper).Init(...)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostovn.go:41
yunion.io/x/onecloud/pkg/hostman/hostinfo.(*SHostInfo).setupOvnChassis(0xc000e82000?)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostinfo.go:225 +0xb8
yunion.io/x/onecloud/pkg/hostman/hostinfo.(*SHostInfo).Init(0x5674ad0?)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostinfo.go:210 +0xdc
yunion.io/x/onecloud/pkg/hostman.(*SHostService).RunService(0xc000010160?)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:80 +0x6f
yunion.io/x/onecloud/pkg/cloudcommon/service.(*SServiceBase).StartService(0xc00000e108)
        /root/go/src/yunion.io/x/onecloud/pkg/cloudcommon/service/services.go:58 +0xe4
yunion.io/x/onecloud/pkg/hostman.StartService(...)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:163
main.main()
        /root/go/src/yunion.io/x/onecloud/cmd/host/main.go:30 +0x10a
goroutine 1 [running]:
runtime/debug.Stack()
        /usr/lib/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
        /usr/lib/go/src/runtime/debug/stack.go:16 +0x19
yunion.io/x/log.Fatalf({0x30fd118, 0x1c}, {0xc0016dfea8, 0x1, 0x1})
        /root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/log/log.go:138 +0x32
yunion.io/x/onecloud/pkg/hostman.(*SHostService).RunService(0xc000010160?)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:81 +0xb4
yunion.io/x/onecloud/pkg/cloudcommon/service.(*SServiceBase).StartService(0xc00000e108)
        /root/go/src/yunion.io/x/onecloud/pkg/cloudcommon/service/services.go:58 +0xe4
yunion.io/x/onecloud/pkg/hostman.StartService(...)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:163
main.main()
        /root/go/src/yunion.io/x/onecloud/cmd/host/main.go:30 +0x10a
[fatal 2024-05-20 02:46:59 hostman.(*SHostService).RunService(host_services.go:81)] Host instance init error: Setup OVN Chassis: normalize db host: dns lookup (default-ovn-north) failed: lookup default-ovn-north on 10.96.0.10:53: no such host

3,计算节点上ipconfig信息:

[root@node10 ~]# ifconfig 
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 1500
        ether 9c:74:1a:c1:89:46  txqueuelen 1000  (Ethernet)
        RX packets 5903  bytes 868402 (848.0 KiB)
        RX errors 0  dropped 6  overruns 0  frame 0
        TX packets 43  bytes 2870 (2.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

bond1: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 1500
        ether 04:42:1a:cb:4b:6a  txqueuelen 1000  (Ethernet)
        RX packets 20225  bytes 5402646 (5.1 MiB)
        RX errors 0  dropped 6  overruns 0  frame 0
        TX packets 11854  bytes 1268500 (1.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.1.234  netmask 255.255.255.0  broadcast 172.16.1.255
        inet6 fe80::642:1aff:fecb:4b6a  prefixlen 64  scopeid 0x20<link>
        ether 04:42:1a:cb:4b:6a  txqueuelen 1000  (Ethernet)
        RX packets 14997  bytes 4428205 (4.2 MiB)
        RX errors 0  dropped 249  overruns 0  frame 0
        TX packets 10986  bytes 1159598 (1.1 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

br1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.1.234  netmask 255.255.255.0  broadcast 10.0.1.255
        inet6 fe80::9e74:1aff:fec1:8946  prefixlen 64  scopeid 0x20<link>
        ether 9c:74:1a:c1:89:46  txqueuelen 1000  (Ethernet)
        RX packets 5361  bytes 724815 (707.8 KiB)
        RX errors 0  dropped 289  overruns 0  frame 0
        TX packets 19  bytes 1282 (1.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eno1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500
        ether 04:42:1a:cb:4b:6a  txqueuelen 1000  (Ethernet)
        RX packets 2969  bytes 178698 (174.5 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xd3620000-d363ffff  

eno2: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500
        ether 04:42:1a:cb:4b:6a  txqueuelen 1000  (Ethernet)
        RX packets 17265  bytes 5224750 (4.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 11868  bytes 1272200 (1.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xd3600000-d361ffff  

enp28s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500
        ether 9c:74:1a:c1:89:46  txqueuelen 1000  (Ethernet)
        RX packets 1332  bytes 330415 (322.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 22  bytes 1428 (1.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp28s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500
        ether 9c:74:1a:c1:89:46  txqueuelen 1000  (Ethernet)
        RX packets 4571  bytes 537987 (525.3 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 21  bytes 1442 (1.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

genev_sys_6081: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65000
        inet6 fe80::ec61:f8ff:fe76:a380  prefixlen 64  scopeid 0x20<link>
        ether ee:61:f8:76:a3:80  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 13 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 306  bytes 18416 (17.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 306  bytes 18416 (17.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

4,host.conf的网络信息:

ovn_encap_ip: 10.0.1.234
networks:
- bond1/br0/172.16.1.234
- bond0/br1/10.0.1.234

没改动内容情况下,重启该计算节点就报错了。、

请求解决思路,排查问题点,谢谢!!!

chenjacken avatar May 20 '24 02:05 chenjacken