kube-ovn icon indicating copy to clipboard operation
kube-ovn copied to clipboard

KubeVirt 热迁移后,虚拟机流量仍然走原节点;若将原节点关机,虚拟机则无法上网

Open laohyx opened this issue 1 year ago • 2 comments

Bug Report

Steps to Reproduce the Problem

复现步骤:KubeVirt 热迁移一台虚拟机,从 a 节点到 b 节点。此时虚拟机可正常上网,但通过抓包发现出口流量,仍然从 a 节点流出。

若将A节点关机,则虚拟机无法上网了。

Expected Behavior

期望在 A 节点抓包时,没有相关流量(参考命令见下)

若将A节点关机,虚拟机应能上网

Actual Behavior

# B 节点,虽然虚拟机在这里,但抓包只有这些(回包路由是到另外一台节点的,符合预期,也工作正常)
# tcpdump -i any icmp and host 1.1.1.1
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
13:52:50.600175 1fc2da3da329_h P   IP 10.18.1.41 > one.one.one.one: ICMP echo request, id 8, seq 41, length 64
13:52:50.600189 genev_sys_6081 Out IP 10.18.1.41 > one.one.one.one: ICMP echo request, id 8, seq 41, length 64
13:52:50.772548 genev_sys_6081 P   IP one.one.one.one > 10.18.1.41: ICMP echo reply, id 8, seq 41, length 64
13:52:50.772595 1fc2da3da329_h Out IP one.one.one.one > 10.18.1.41: ICMP echo reply, id 8, seq 41, length 64


# A节点,实际流量确实是从 A 发出的(bond0 、物理网卡有流量)
# tcpdump -i any icmp and host 1.1.1.1
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
14:08:34.739870 genev_sys_6081 P   IP 10.18.1.41 > one.one.one.one: ICMP echo request, id 8, seq 984, length 64
14:08:34.739880 ovn0  In  IP 10.18.1.41 > one.one.one.one: ICMP echo request, id 8, seq 984, length 64
14:08:34.739890 bond0 Out IP 10.18.1.41 > one.one.one.one: ICMP echo request, id 8, seq 984, length 64
14:08:34.739891 ens3f1np1 Out IP 10.18.1.41 > one.one.one.one: ICMP echo request, id 8, seq 984, length 64
14:08:34.904748 ens3f1np1 In  IP one.one.one.one > 10.18.1.41: ICMP echo reply, id 8, seq 984, length 64
14:08:34.904748 bond0 In  IP one.one.one.one > 10.18.1.41: ICMP echo reply, id 8, seq 984, length 64
14:08:34.904760 ovn0  Out IP one.one.one.one > 10.18.1.41: ICMP echo reply, id 8, seq 984, length 64
14:08:34.904764 genev_sys_6081 Out IP one.one.one.one > 10.18.1.41: ICMP echo reply, id 8, seq 984, length 64

Additional Info

  • Kubernetes version:

    Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.11", GitCommit:"3cd242c51317aed8858119529ccab22079f523b1", GitTreeState:"clean", BuildDate:"2023-11-15T17:00:54Z", GoVersion:"go1.20.11", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.11", GitCommit:"3cd242c51317aed8858119529ccab22079f523b1", GitTreeState:"clean", BuildDate:"2023-11-15T16:50:12Z", GoVersion:"go1.20.11", Compiler:"gc", Platform:"linux/amd64"}

  • kube-ovn version:
docker.io/kubeovn/kube-ovn:v1.12.0
  • operation-system/kernel version:
Ubuntu 22.04.2 LTS
5.15.0-71-generic
  

subnet 配置 (开启 distributed gateway)

- apiVersion: kubeovn.io/v1
  kind: Subnet
  metadata:
    creationTimestamp: "2023-11-21T08:45:17Z"
    finalizers:
    - kube-ovn-controller
    generation: 2
    name: ovn-default
    resourceVersion: "11844322"
    uid: c8cf11a2-ef7d-469f-9d07-b38a97e4485d
  spec:
    cidrBlock: 10.18.0.0/16
    default: true
    enableLb: true
    excludeIps:
    - 10.18.0.1
    gateway: 10.18.0.1
    gatewayNode: ""
    gatewayType: distributed
    natOutgoing: false
    private: false
    protocol: IPv4
    provider: ovn
    vpc: ovn-cluster

laohyx avatar Dec 06 '23 08:12 laohyx

我们之前测试 vlan 场景有类似的问题,当时我们查了 业务网桥的 fdb 表,当时看到虚拟机 mac 正在等待老化。 可以参考下查下 vpc 场景 fdb 相关的东西

bobz965 avatar Dec 07 '23 02:12 bobz965

我在 1.12.4 下发现这个问题的一些线索

大概是因为 vm 被放到了多个 port_group 而正好又是用的 gatewayType: distributed 导致存在多个出口,没有正确配置出口

# gatewayType: distributed 出口
~# kubectl ko nbctl lr-policy-list ovn-cluster
Routing Policies
...
     29000              ip4.src == $ovn.default.caas.a151_ip4         reroute                100.96.0.4
     29000              ip4.src == $ovn.default.caas.a152_ip4         reroute                100.96.0.6
     29000              ip4.src == $ovn.default.caas.b151_ip4         reroute                100.96.0.5
     29000              ip4.src == $ovn.default.caas.b152_ip4         reroute                100.96.0.7
     29000              ip4.src == $ovn.default.caas.c151_ip4         reroute                100.96.0.2
     29000              ip4.src == $ovn.default.caas.c152_ip4         reroute                100.96.0.3
...
# vm 在多个机器出口的 port_group 中
~# kubectl ko nbctl list port_group | grep ce806dc0-831d-411c-b297-fa16cb634ad3 -B 1
name                : ovn.default.caas.a152
ports               : [0b104719-4215-439b-a54b-1ed613db5090, 15c7e289-3425-4aea-9db0-eeab1819ec72, 2209c813-0a83-4208-8435-1b16865b7d02, 23e60868-0dae-4171-87f8-2afec9756ad9, 2462cfac-07ba-4fca-abbf-c4d93be53cad, 2a5b4f4f-cb15-47ff-a377-9faec7d9dd74, 388cab48-5bfe-45e3-9bb6-d3bf24f70797, 43493003-f254-4dbe-92a9-396e220cb39d, 49b405ea-36df-43cb-9633-e45a72e2cbbd, 52d3209b-c0ad-47cb-9bd0-74659393d485, 5bd56bd1-99af-4c30-94a4-616de40d31d8, 5d5c4815-998d-44b3-8c28-15ade950c394, 5e167554-2dee-4fe0-95a6-66d41a83f942, 6216c4a6-d66a-4d47-8f57-6fbbd97ff2b1, 66412368-aa66-4bbf-ab85-c2ece0d0a545, 6bba85c7-a49a-429a-bc3b-8b759cf5762e, 6d927c17-7165-4dde-8f98-1724bc01ea08, 7de42287-3115-4804-bc87-f0d19ea7689e, 8aba1991-ba7d-46e5-864b-9dcedd48c010, a9a1d2d9-0f0c-4ba9-91e8-fdbfcebd3fce, b7ab2f78-1c07-450d-a2b4-2b86b481c417, bcf1339c-d460-4749-a683-1b79da260387, c04b7267-91e1-4a7e-8017-6ff82a4b94f6, ce806dc0-831d-411c-b297-fa16cb634ad3, d041e6d1-0306-4d5e-8460-5d21fb2f3e50, d1ec6db9-111b-4291-9a56-d8378d84d10d, db1fb46d-306e-4909-ab12-3fb9e2c6d3c2, dbba3b27-cb70-4c49-b9b3-f7d24d9343d9, e29614b6-27b2-4535-bb3a-e1989916b908, e777d5ee-313f-49b0-99d4-7e9a487e5783, e87ce162-cedd-4926-944f-e576c9b5a334, edcd6c28-ac77-439f-b5e9-e8d823083f21, f9e7055f-ae71-47ad-b1b5-ccb21f2b3cb6, fa4954dd-2830-489f-8e1a-4286d8428e14, fb4b9c51-e909-4986-915e-a149eb34e3e3, fef8ca6e-9730-4add-b9b0-72f48e651860]
--
name                : ovn.default.caas.a151
ports               : [0e7636a2-a484-462f-b5b6-1fd800134de7, 137980b6-ce32-4f85-888c-253d1d257b8d, 14abc3da-0a61-404a-922a-10a10ecfbdd4, 158fb475-a9c5-494b-8dc5-32c6ea062de3, 15939c6d-069c-402b-ab38-f4d1fa00af39, 18d5cf8a-0d1d-453f-b88d-97f553231d93, 30b43ccd-8772-4f4c-9f09-b9b455112921, 36478dad-78b3-4d73-9c34-5f1ad1fa6742, 3e802de3-676d-47a5-88f6-27641e5c3402, 4237dcc0-c954-4add-80f3-a7958f781a33, 450461f5-d685-435e-b31d-3886961f7a3c, 504649c5-053c-47f5-a8bb-3279dd165072, 5253e770-f7fe-4e6c-9c83-615f32e10e65, 529e4e31-073b-4db4-b630-13c1ea9b50bf, 5353b494-fe13-4f05-9bdd-0399b2773f66, 7e4cda53-aa2a-4fc9-9d6e-e97002cd44c3, 80562eb0-a056-4489-b5de-323f6cf34271, 837a371a-877e-45af-8dcc-4930ecf7af61, 84d602b5-857d-4f2d-952c-f4423ed10f58, 85e04e55-8499-4446-af01-9a3ad08d330b, 951731a6-5fe7-4234-ac6c-eb45c649f67a, aebe6e66-c816-4ee0-bd44-17681972fdc1, b6b0f40b-cae9-4d54-b022-25f96c823ba4, c0865495-2626-4736-972e-34b5867cb4ae, c31d463b-d570-428f-a5ea-77bd628a31a4, ce806dc0-831d-411c-b297-fa16cb634ad3, cf083824-490e-4d7c-988b-fef1bc9d6bf9, d094cacc-6112-4f23-87c1-19c6b9e749c9, e329ba70-c889-4a12-bbcb-9218cb48bb49, e71a57b7-b42f-4d18-b286-db92b641320e, edb6a2aa-4cfe-45c5-a58f-7c729585bf27]
--
name                : ovn.default.caas.c151
ports               : [0768edde-75bf-4d00-82ff-2b070a7553ae, 0986be7a-e5c4-4603-8d2c-0e39ebc2fcde, 1b5ed84b-c862-449f-9c36-2b6e1337099a, 1f8b1e5b-d41c-48ff-95c3-5a14bba96e86, 27a54ac1-83f8-4031-967e-0352a598c6be, 3cf9b0c5-6303-45a6-9bcf-56dbbda30040, 432f4f2a-71a4-48d0-bf05-05c10522a819, 53b124f2-825f-4e10-935a-da68a928945b, 6b5c906f-4f4b-4787-a276-e47848866ac8, 8da1a968-7811-4419-9227-7530e6ac88da, 91a2ee93-a3c3-4465-8299-c5cc8c46f262, 92f7d365-2bd4-4d03-a474-90cf54d7f537, 95710403-0d71-406f-8aa3-aaba23106caa, 9c14bacc-8b74-4157-b811-cf2ed61c6bba, a9e43829-615c-4b8b-a341-ca9d7334e987, abc22a05-7307-488d-aabf-95a395fdbec3, ac2d3835-8fc7-4c15-a078-ee1d28b9632c, b6ab6b4a-ff74-4ba2-8bda-43e9fdb935b4, c93e1822-faa9-40fc-8be6-944cd303eef0, c96b8491-bc5e-486c-bfeb-32a4b4c6bbb4, ce806dc0-831d-411c-b297-fa16cb634ad3, d941a172-bc8d-4afe-8090-102371b241c4, dcd80964-16ac-46dd-b0e4-4b344f49885a, e54e3b26-62ba-4f2b-808c-d9439d8a9cc2, e7118de0-ff32-46ca-9d5d-75884048dbe8, e8743659-35cc-4ba2-b0a2-92cb471b3cff, f0804fd2-4295-4fd9-a153-475af41049f8, f3a341a5-db87-4013-af3f-659ae1148612]


~# kubectl ko nbctl list logical_switch_port | grep qooeiag0g501 -C 10
tag_request         : []
type                : ""
up                  : true

_uuid               : ce806dc0-831d-411c-b297-fa16cb634ad3
addresses           : ["00:00:00:98:02:0A 10.18.0.85"]
dhcpv4_options      : []
dhcpv6_options      : []
dynamic_addresses   : []
enabled             : []
external_ids        : {ls=ovn-default, pod="test-ns/i-qooeiag0g501", vendor=kube-ovn}
ha_chassis_group    : []
mirror_rules        : []
name                : i-qooeiag0g501.test-ns
options             : {}
parent_name         : []
port_security       : []
tag                 : []
tag_request         : []
type                : ""
up                  : true

_uuid               : 4237dcc0-c954-4add-80f3-a7958f781a33
addresses           : ["00:00:00:EB:5B:17 10.18.43.108"]

a180285 avatar Jan 25 '24 15:01 a180285

Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.

github-actions[bot] avatar Mar 26 '24 00:03 github-actions[bot]