cloudpods icon indicating copy to clipboard operation
cloudpods copied to clipboard

[BUG]VPC网络,指定VLAN TAG做业务口,host无法正确启动

Open zhfish opened this issue 1 year ago • 6 comments

问题描述/What happened: host启动失败,宿主机网络肯定是离线的 网页里的二层网络能看到宿主机的ip bond做了聚合,vnet是bond0的vlan ovs已经建了br0,vnet也被正确添加进去了 这里有个疑问,做vlan的话,是ovs绑定bond0指定vlan,还是直接绑定vlan接口?

[root@gpu-4 ~]# nmcli dev status
vnet        vlan      已连接  vlan-vnet
bond0       bond      已连接  bond0
enp44s0f0   ethernet  已连接  bond-slave-enp44s0f0
enp45s0f0   ethernet  已连接  bond-slave-enp45s0f0
enp193s0f0  ethernet  已断开  --
enp193s0f1  ethernet  不可用  --
enp193s0f2  ethernet  不可用  --
enp193s0f3  ethernet  不可用  --
enp44s0f1   ethernet  不可用  --
enp45s0f1   ethernet  不可用  --
lo          loopback  未托管  --
[warning 2024-03-22 16:33:35 hostinfo.(*SHostInfo).isVirtualFunction(hostinfo.go:1650)] failed get nic enp45s0f1 phys_port_name: read /sys/class/net/enp45s0f1/phys_port_name: operation not supported
[info 2024-03-22 16:33:35 hostinfo.(*SHostInfo).doSendPhysicalNicInfo(hostinfo.go:1730)] upload physical nic: enp45s0f1(0c:42:a1:ec:b3:ab)
[info 2024-03-22 16:33:35 hostinfo.(*SHostInfo).doUploadNicInfoInternal(hostinfo.go:1747)] Upload NIC br: if:enp45s0f1
[info 2024-03-22 16:33:35 hostinfo.(*SHostInfo).doUploadNicInfoInternal(hostinfo.go:1747)] Upload NIC br:br0 if:vnet
[error 2024-03-22 16:33:35 hostinfo.(*SHostInfo).onFail(hostinfo.go:1105)] register failed: initHostNetworks: uploadNetworkInfo: doSyncNicInfo vnet: doUploadNicInfoInternal: modules.Hosts.PerformAction add-netif: {"error":{"class":"BadRequestError","code":400,"details":"addNetif: {\"error\":{\"class\":\"BadRequestError\",\"code\":400,\"data\":{\"fields\":[{}],\"id\":\"%!v(MISSING)\"},\"details\":\"hh.Attach2Network: net.GetFreeIP: getFreeIP: {\\\"error\\\":{\\\"class\\\":\\\"InsufficientResourceError\\\",\\\"code\\\":400,\\\"data\\\":{\\\"id\\\":\\\"Out of IP address\\\"},\\\"details\\\":\\\"Out of IP address\\\"}}\"}}","request":{"body":"{\"host\":{\"bridge\":\"br0\",\"interface\":\"vnet\",\"ip_addr\":\"10.106.75.4\",\"link_up\":true,\"mac\":\"08:c0:eb:3b...id\":875,\"wire\":\"bcast0\"}}","headers":{"Content-Length":"173","Content-Type":"application/json","User-Agent":"yunioncloud-go/201708","X-Auth-Token":"*","X-Yunion-Parent-Id":"","X-Yunion-Peer-Service-Name":"host","X-Yunion-Remote-Addr":"default-region:30888","X-Yunion-Span-Id":"0","X-Yunion-Span-Name":"","X-Yunion-Strace-Debug":"true","X-Yunion-Strace-Id":"2b2367c7"},"method":"POST","url":"https://default-region:30888/hosts/0d632ec4-2347-4fd6-8f6b-ea22314d131c/add-netif"}}}
panic: exit immediately for retry...

goroutine 1 [running]:
yunion.io/x/onecloud/pkg/hostman/hostinfo.(*SHostInfo).onFail(0xc0002e8580, {0x36244a0?, 0xc001f64138?})
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostinfo.go:1108 +0x44a
yunion.io/x/onecloud/pkg/hostman/hostinfo.(*SHostInfo).register(0xc0002e8580)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostinfo.go:1082 +0xfb
yunion.io/x/onecloud/pkg/hostman/hostinfo.(*SHostInfo).StartRegister(0xc001043550?, 0xc000c32300?)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostinfo.go:1049 +0x32
yunion.io/x/onecloud/pkg/hostman.(*SHostService).RunService(0xc00022e398?)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:107 +0x2df
yunion.io/x/onecloud/pkg/cloudcommon/service.(*SServiceBase).StartService(0xc0002c0150)
        /root/go/src/yunion.io/x/onecloud/pkg/cloudcommon/service/services.go:58 +0xe4
yunion.io/x/onecloud/pkg/hostman.StartService(...)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:163
main.main()
        /root/go/src/yunion.io/x/onecloud/cmd/host/main.go:30 +0x10a

环境/Environment:

  • OS (e.g. cat /etc/os-release):
NAME="OpenCloudOS"
VERSION="8.8"
ID="opencloudos"
ID_LIKE="rhel fedora"
VERSION_ID="8.8"
PLATFORM_ID="platform:oc8"
PRETTY_NAME="OpenCloudOS 8.8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:opencloudos:opencloudos:8"
HOME_URL="https://www.opencloudos.org/"
BUG_REPORT_URL="https://bugs.opencloudos.tech/"
  • Kernel (e.g. uname -a):
Linux gpu-4.cloud 5.4.119-20.0009.29 #1 SMP Mon Aug 14 20:03:28 CST 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Host: (e.g. dmidecode | egrep -i 'manufacturer|product' |sort -u)
        idProduct: 0xffb0
        Manufacturer:
        Manufacturer: ACBEL
        Manufacturer: Advanced Micro Devices, Inc.
        Manufacturer: Micron Technology
        Memory Subsystem Controller Manufacturer ID: Unknown
        Memory Subsystem Controller Product ID: Unknown
        Module Manufacturer ID: Bank 1, Hex 0x2C
        Module Product ID: Unknown
        Product Name: 65MA32
        Product Name: X7840A0
  • Service Version (e.g. kubectl exec -n onecloud $(kubectl get pods -n onecloud | grep climc | awk '{print $1}') -- climc version-list): 没取到,过了一段时间,所有容器都exited了
The connection to the server 10.106.75.4:6443 was refused - did you specify the right host or port?
error: expected 'exec (POD | TYPE/NAME) COMMAND [ARG1] [ARG2] ... [ARGN]'.
POD or TYPE/NAME and COMMAND are required arguments for the exec command
See 'kubectl exec -h' for help and examples
[root@gpu-4 ~]# climc

从 #19608 开始, 我重新装了系统,全新安装,并且只配置了一个ip,防止多ip干扰。

zhfish avatar Mar 22 '24 18:03 zhfish

@zhfish 请参考 https://www.cloudpods.org/docs/guides/onpremise/network/examples

感觉你的配置像这个场景? image

vnet是vlan口,用于宿主机本地通信,同时这个虚拟机可以用bond0上的其他vlan,是这样吗?

swordqiu avatar Mar 23 '24 02:03 swordqiu

差不多,但有差异 原计划: 管理口eth0,不设置vlan,走trunk口的默认vlanid 业务口bond0,指定vlan

后因上述问题 直接在管理口用bond0 + 指定vlan去做测试,避免多IP干扰 管理口和业务口都用同一vlan去做通信

应该说预期至少是单网口VPC网络(指定vlan) 或者 双网口VPC网络(管理口不指定vlan,业务口指定vlan)

listen_interface: bond0.3001
networks:
- bond0/br0/bcast0

和这个相反,我希望networks里的bond0可以指定vlanid

zhfish avatar Mar 23 '24 02:03 zhfish

差不多,但有差异 原计划: 管理口eth0,不设置vlan,走trunk口的默认vlanid 业务口bond0,指定vlan

后因上述问题 直接在管理口用bond0 + 指定vlan去做测试,避免多IP干扰 管理口和业务口都用同一vlan去做通信

应该说预期至少是单网口VPC网络(指定vlan) 或者 双网口VPC网络(管理口不指定vlan,业务口指定vlan)

listen_interface: bond0.3001
networks:
- bond0/br0/bcast0

和这个相反,我希望networks里的bond0可以指定vlanid

是这个模式吗? image

swordqiu avatar Mar 23 '24 21:03 swordqiu

理想状态(双网口 VPC网络 ) image

最小方案(单网口 VPC网络) image

zhfish avatar Mar 24 '24 02:03 zhfish

第一个配置:

需要在平台添加一个包含eth0 IP的IP子网,不需要设置bond0的IP

listen_interface: eth0
networks:
- bond0/br1/bcast1

第二个配置:

需要在平台添加一个包含bond0 IP的IP子网

networks:
- bond0/br1/<ip_of_bond0>

理想状态(双网口 VPC网络 ) image

最小方案(单网口 VPC网络) image

swordqiu avatar Mar 26 '24 01:03 swordqiu

第一个配置:

需要在平台添加一个包含eth0 IP的IP子网,不需要设置bond0的IP

listen_interface: eth0
networks:
- bond0/br1/bcast1

第二个配置:

需要在平台添加一个包含bond0 IP的IP子网

networks:
- bond0/br1/<ip_of_bond0>

理想状态(双网口 VPC网络 ) image 最小方案(单网口 VPC网络) image

第一个配置,vpc虚拟机会走bond0?

第二个配置就不太对了。。bond0本身是没ip的,给设置了ip也用不了,因为需要指定vlan tag,但桥接vlan子接口又会报错。

zhfish avatar Mar 26 '24 02:03 zhfish