kube-ovn
kube-ovn copied to clipboard
release-1.12 linux vm live migration ping lost more than 10 seconds
Bug Report
release-1.12 vm live migration ping lost less than 3 seconds
Expected Behavior
release-1.12 vm live migration ping lost more than 10 seconds
Actual Behavior
Steps to Reproduce the Problem
Additional Info
we have fixed this, pr will be later
-
Kubernetes version:
Output of
kubectl version
:(paste your output here)
-
kube-ovn version:
(paste your output here)
-
operation-system/kernel version:
Output of
awk -F '=' '/PRETTY_NAME/ { print $2 }' /etc/os-release
: Output ofuname -r
:(paste your output here)
Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.
ref: https://www.openvswitch.org/support/ovscon2022/slides/Live-migration-with-OVN.pdf Live migration: reducing downtime with OVN multi chassis bindings
kubevirt 侧在热迁移过程中,需要暴露一些 annotation,以便让 kube-ovn 对 lsp 设置以及清理 migrator options。
kube-ovn 测:
- kube-ovn 识别出迁移出的 target vm, 设置 migrator lsp 标签。
- 迁移完毕后 reset migrator lsp 标签。
- 由于虚拟机 pod force delete 会导致 vm 重新调度,migrator lsp 标签会导致网络不通,所以 vm pod delete 的时候需要清理 migrator lsp 标签。
默认 vpc 场景 测试结果
创建三个 pod 在不同的 node, 同时 ping 0.1 迁移状态中的虚拟机, 虚拟机热迁移三次,查看是否丢包
[root@euler-x86-70 pods]# kgp | grep ping
kube-system euler-x86-70-pinger 1/1 Running 0 5s 10.222.0.43 euler-x86-70 <none> <none>
kube-system euler-x86-71-pinger 1/1 Running 0 5s 10.222.0.44 euler-x86-70 <none> <none>
kube-system euler-x86-73-pinger 1/1 Running 0 5s 10.222.0.45 euler-x86-70 <none> <none>
[root@euler-x86-70 pods]# kgp | grep vm-m
zal virt-launcher-zal-vm-m-b6pf6 1/1 Running 0 138m 10.222.0.130 euler-x86-70 <none> 1/1
[root@euler-x86-70 ~]# kgp | grep vm-m
zal virt-launcher-zal-vm-m-b6pf6 0/1 Completed 0 147m 10.222.0.130 euler-x86-70 <none> 1/1
zal virt-launcher-zal-vm-m-rp67c 1/1 Running 0 2m15s 10.222.0.130 euler-x86-71 <none> 1/1
[root@euler-x86-70 ~]#
[root@euler-x86-70 ~]#
[root@euler-x86-70 ~]# k delete po -n zal virt-launcher-zal-vm-m-b6pf6
pod "virt-launcher-zal-vm-m-b6pf6" deleted # 删除 complete 状态的虚拟机也会触发 options : {} 的 重置,这个操作一般都是人为手动删的。所以不会影响
[root@euler-x86-70 ~]#
[root@euler-x86-70 ~]#
[root@euler-x86-70 ~]# kgp | grep vm-m
zal virt-launcher-zal-vm-m-rp67c 1/1 Running 0 2m35s 10.222.0.130 euler-x86-71 <none> 1/1
[root@euler-x86-70 ~]#
#### 第一次测试,丢包持续 <= 0.5s
3755 packets transmitted, 3750 packets received, 0% packet loss # 丢 5个包
round-trip min/avg/max/stddev = 0.094/0.322/2.159/0.171 ms
root@euler-x86-73-pinger:/kube-ovn#
^C--- 10.222.0.130 ping statistics ---
4710 packets transmitted, 4706 packets received, 0% packet loss # 丢 4个包
round-trip min/avg/max/stddev = 0.104/0.311/21.394/0.346 ms
root@euler-x86-70-pinger:/kube-ovn#
^C--- 10.222.0.130 ping statistics ---
4462 packets transmitted, 4458 packets received, 0% packet loss # 丢4个包
round-trip min/avg/max/stddev = 0.117/0.347/21.338/0.361 ms
root@euler-x86-71-pinger:/kube-ovn#
#### 第二次测试,丢包持续 <= 0.8s
^C--- 10.222.0.130 ping statistics ---
735 packets transmitted, 728 packets received, +22 duplicates, 0% packet loss # 丢 7个包
round-trip min/avg/max/stddev = 0.037/0.463/7.571/1.040 ms
root@euler-x86-73-pinger:/kube-ovn#
^C--- 10.222.0.130 ping statistics ---
699 packets transmitted, 691 packets received, +21 duplicates, 1% packet loss # 丢 8个包
round-trip min/avg/max/stddev = 0.038/0.497/7.467/1.059 ms
root@euler-x86-70-pinger:/kube-ovn#
^C--- 10.222.0.130 ping statistics ---
664 packets transmitted, 656 packets received, +21 duplicates, 1% packet loss # 丢 8个包
round-trip min/avg/max/stddev = 0.038/0.488/7.227/1.076 ms
root@euler-x86-71-pinger:/kube-ovn#
#### 第三次测试,丢包持续 <= 0.5s
^C--- 10.222.0.130 ping statistics ---
1917 packets transmitted, 1912 packets received, 0% packet loss # 丢5个包
round-trip min/avg/max/stddev = 0.105/0.350/2.434/0.180 ms
root@euler-x86-73-pinger:/kube-ovn#
^C--- 10.222.0.130 ping statistics ---
1960 packets transmitted, 1955 packets received, 0% packet loss # 丢5个包
round-trip min/avg/max/stddev = 0.095/0.249/1.495/0.133 ms
root@euler-x86-70-pinger:/kube-ovn#
^C--- 10.222.0.130 ping statistics ---
1920 packets transmitted, 1915 packets received, 0% packet loss # 丢5个包
round-trip min/avg/max/stddev = 0.066/0.287/2.390/0.201 ms
root@euler-x86-71-pinger:/kube-ovn#
#### 连续切换5次,平均 丢包持续 <= 0.5s
^C--- 10.222.0.130 ping statistics ---
4917 packets transmitted, 4892 packets received, 0% packet loss # 丢25个包
round-trip min/avg/max/stddev = 0.086/0.438/109.496/2.169 ms
root@euler-x86-73-pinger:/kube-ovn#
^C--- 10.222.0.130 ping statistics ---
4908 packets transmitted, 4883 packets received, 0% packet loss # 丢25个包
round-trip min/avg/max/stddev = 0.085/0.450/109.385/2.166 ms
root@euler-x86-70-pinger:/kube-ovn#
^C--- 10.222.0.130 ping statistics ---
4899 packets transmitted, 4874 packets received, 0% packet loss # 丢25个包
round-trip min/avg/max/stddev = 0.082/0.460/109.474/2.172 ms
root@euler-x86-71-pinger:/kube-ovn#
vlan 场景测试结果
#### 虚拟机 ip
[root@zal-vm-m ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:00:00:86:a7:52 brd ff:ff:ff:ff:ff:ff
inet 100.71.45.70/26 brd 100.71.45.127 scope global dynamic noprefixroute eth0
valid_lft 86313330sec preferred_lft 86313330sec
inet6 fe80::200:ff:fe86:a752/64 scope link noprefixroute
valid_lft forever preferred_lft forever
[root@zal-vm-m ~]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
100.71.45.64 0.0.0.0 255.255.255.192 U 100 0 0 eth0
[root@zal-vm-m ~]# ip route add default via 100.71.45.126
[root@zal-vm-m ~]#
[root@zal-vm-m ~]#
[root@zal-vm-m ~]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 100.71.45.126 0.0.0.0 UG 0 0 0 eth0
100.71.45.64 0.0.0.0 255.255.255.192 U 100 0 0 eth0
[root@zal-vm-m ~]#
[root@zal-vm-m ~]#
[root@zal-vm-m ~]#
[root@zal-vm-m ~]# [root@euler-x86-70 vlan]#
[root@euler-x86-70 vlan]#
[root@euler-x86-70 vlan]#
[root@euler-x86-70 vlan]# ping 100.71.45.70
PING 100.71.45.70 (100.71.45.70) 56(84) bytes of data.
64 bytes from 100.71.45.70: icmp_seq=1 ttl=63 time=0.863 ms
#### pinger
[root@euler-x86-70 vlan]# kgp| grep pinger
kube-system euler-x86-70-pinger 1/1 Running 0 25m 100.71.45.65 euler-x86-70 <none> <none>
kube-system euler-x86-71-pinger 1/1 Running 0 25m 100.71.45.66 euler-x86-70 <none> <none>
kube-system euler-x86-73-pinger 1/1 Running 0 25m 100.71.45.67 euler-x86-70 <none> <none>
#### 第一次测试,0.1s 间隔 ping, 0丢包
^C--- 100.71.45.71 ping statistics ---
2178 packets transmitted, 2178 packets received, +1 duplicates, 0% packet loss
round-trip min/avg/max/stddev = 0.023/0.117/2.328/0.109 ms
root@euler-x86-70-pinger:/kube-ovn#
^C--- 100.71.45.71 ping statistics ---
2137 packets transmitted, 2137 packets received, +1 duplicates, 0% packet loss
round-trip min/avg/max/stddev = 0.023/0.117/2.432/0.109 ms
root@euler-x86-71-pinger:/kube-ovn#
^C--- 100.71.45.71 ping statistics ---
1958 packets transmitted, 1958 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.023/0.113/2.082/0.108 ms
root@euler-x86-73-pinger:/kube-ovn#
#### 第二次测试,0.1s 间隔 ping, 0丢包
^C--- 100.71.45.71 ping statistics ---
995 packets transmitted, 995 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.038/0.208/5.550/0.502 ms
root@euler-x86-70-pinger:/kube-ovn#
^C--- 100.71.45.71 ping statistics ---
1035 packets transmitted, 1035 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.038/0.222/5.539/0.533 ms
root@euler-x86-71-pinger:/kube-ovn#
^C--- 100.71.45.71 ping statistics ---
1093 packets transmitted, 1093 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.039/0.217/5.589/0.522 ms
root@euler-x86-73-pinger:/kube-ovn#
#### 第三次测试,0.1s 间隔 ping, 0丢包
^C--- 100.71.45.71 ping statistics ---
1535 packets transmitted, 1535 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.023/0.268/85.838/2.559 ms
root@euler-x86-70-pinger:/kube-ovn#
^C--- 100.71.45.71 ping statistics ---
1527 packets transmitted, 1527 packets received, +1 duplicates, 0% packet loss
round-trip min/avg/max/stddev = 0.023/0.298/80.868/2.585 ms
root@euler-x86-71-pinger:/kube-ovn#
^C--- 100.71.45.71 ping statistics ---
1535 packets transmitted, 1535 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.022/0.182/32.376/0.830 ms
root@euler-x86-73-pinger:/kube-ovn#
#### 全新连续切换5次,0.1s 间隔 ping, 0丢包, 平均 dup <=0.5ms
^C--- 100.71.45.71 ping statistics ---
4866 packets transmitted, 4866 packets received, +25 duplicates, 0% packet loss
round-trip min/avg/max/stddev = 0.036/0.230/5.397/0.375 ms
root@euler-x86-70-pinger:/kube-ovn#
^C--- 100.71.45.71 ping statistics ---
4861 packets transmitted, 4861 packets received, +24 duplicates, 0% packet loss
round-trip min/avg/max/stddev = 0.037/0.243/9.979/0.414 ms
root@euler-x86-71-pinger:/kube-ovn#
^C--- 100.71.45.71 ping statistics ---
4859 packets transmitted, 4859 packets received, +25 duplicates, 0% packet loss
round-trip min/avg/max/stddev = 0.035/0.231/4.682/0.371 ms
root@euler-x86-73-pinger:/kube-ovn#
vlan 场景下丢包率更低
pr 3767 has add support of ovn lsp migration options settings,but it only work for pod with MigrationSourceAnnotation and MigrationTargetAnnotation both set, which is kubevirt's duty. but i have not found MigrationSourceAnnotation been set in kubevrit latest source code(main branch with 055c6e0491fa93befa6372ca4d367916cabcb5af), how the upper test done?
@Longchuanzheng will you please commit the code to the kubevirt in kube-ovn, thanks!
@Longchuanzheng will you please commit the code to the kubevirt in kube-ovn, thanks!
OK, I will upload the functional code first, although there are still some unit tests that are not completed. I will finish the rest as soon as possible.
@bobz965, @anyfeel https://github.com/kubeovn/kubevirt-dpdk/pull/1