cloudpods
cloudpods copied to clipboard
[求助/Help]cloudpods集群的3个master节点同时挂载了VIP
版本:3.11.10
集群部署后正常运行了大概3周,然后通过VIP突然访问不了了,在3个master节点上查看发现都挂载了VIP,各master节点的keepalived日志如下:
[2025-06-12 01:50:52] got router interface: bond0
[2025-06-12 01:50:52] interface bond0 OK
Thu Jun 12 01:50:52 2025: VRRP_Script(check_route) succeeded
[2025-06-12 01:50:52] curl -k -XGET https://172.16.15.116:6443/healthz --cert /var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt --key /var/lib/rancher/k3s/server/tls/client-kube-apiserver.key --cacert /var/lib/rancher/k3s/server/tls/client-ca.crt ok
Thu Jun 12 01:50:52 2025: VRRP_Script(check_kube) succeeded
Thu Jun 12 01:50:52 2025: (VI_1) Entering BACKUP STATE
Ok, i'm just a backup, great.
[2025-06-12 01:50:55] curl -k -XGET https://172.16.15.116:6443/healthz --cert /var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt --key /var/lib/rancher/k3s/server/tls/client-kube-apiserver.key --cacert /var/lib/rancher/k3s/server/tls/client-ca.crt ok
Thu Jun 12 01:50:55 2025: (VI_1) Receive advertisement timeout
Thu Jun 12 01:50:55 2025: (VI_1) Entering MASTER STATE
Thu Jun 12 01:50:55 2025: (VI_1) setting VIPs.
Thu Jun 12 01:50:55 2025: Sending gratuitous ARP on bond0 for 172.16.15.240
Thu Jun 12 01:50:55 2025: (VI_1) Sending/queueing gratuitous ARPs on bond0 for 172.16.15.240
Thu Jun 12 01:50:55 2025: Sending gratuitous ARP on bond0 for 172.16.15.240
Thu Jun 12 01:50:55 2025: Sending gratuitous ARP on bond0 for 172.16.15.240
Thu Jun 12 01:50:55 2025: Sending gratuitous ARP on bond0 for 172.16.15.240
Thu Jun 12 01:50:55 2025: Sending gratuitous ARP on bond0 for 172.16.15.240
I'm the MASTER! Whup whup.
[2025-06-12 01:50:58] curl -k -XGET https://172.16.15.116:6443/healthz --cert /var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt --key /var/lib/rancher/k3s/server/tls/client-kube-apiserver.key --cacert /var/lib/rancher/k3s/server/tls/client-ca.crt ok
Thu Jun 12 01:51:00 2025: Sending gratuitous ARP on bond0 for 172.16.15.240
Thu Jun 12 01:51:00 2025: (VI_1) Sending/queueing gratuitous ARPs on bond0 for 172.16.15.240
Thu Jun 12 01:51:00 2025: Sending gratuitous ARP on bond0 for 172.16.15.240
Thu Jun 12 01:51:00 2025: Sending gratuitous ARP on bond0 for 172.16.15.240
Thu Jun 12 01:51:00 2025: Sending gratuitous ARP on bond0 for 172.16.15.240
Thu Jun 12 01:51:00 2025: Sending gratuitous ARP on bond0 for 172.16.15.240
[2025-06-12 01:51:01] curl -k -XGET https://172.16.15.116:6443/healthz --cert /var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt --key /var/lib/rancher/k3s/server/tls/client-kube-apiserver.key --cacert /var/lib/rancher/k3s/server/tls/client-ca.crt ok
[2025-06-12 01:51:04] curl -k -XGET https://172.16.15.116:6443/healthz --cert /var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt --key /var/lib/rancher/k3s/server/tls/client-kube-apiserver.key --cacert /var/lib/rancher/k3s/server/tls/client-ca.crt ok
[2025-06-12 01:51:07] got router interface: bond0
[2025-06-12 01:51:07] interface bond0 OK
三个控制节点的keepalived日志显示他们都认为自己是master,delete keepalievd pod重启后问题依旧存在。
@a1226207408 底层网络有没有允许 keepalived 的 vrrp 协议?
@zexi 在3个master节点上用tcpdump抓bond0上的vrrp包是可以抓到的,并且同一个子网下的mariadb的VIP还是正常工作的:
root@u114:~# tcpdump -i bond0 -n vrrp
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on bond0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
17:01:22.650901 IP 172.16.15.117 > 224.0.0.18: VRRPv2, Advertisement, vrid 62, prio 100, authtype simple, intvl 1s, length 20
17:01:22.894362 IP 172.16.15.115 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:23.071304 IP 172.16.15.116 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:23.167377 IP 172.16.15.114 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 100, authtype simple, intvl 1s, length 20
17:01:23.651067 IP 172.16.15.117 > 224.0.0.18: VRRPv2, Advertisement, vrid 62, prio 100, authtype simple, intvl 1s, length 20
17:01:23.894437 IP 172.16.15.115 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:24.071479 IP 172.16.15.116 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:24.167596 IP 172.16.15.114 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 100, authtype simple, intvl 1s, length 20
17:01:24.651238 IP 172.16.15.117 > 224.0.0.18: VRRPv2, Advertisement, vrid 62, prio 100, authtype simple, intvl 1s, length 20
17:01:24.894497 IP 172.16.15.115 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:25.071615 IP 172.16.15.116 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:25.167692 IP 172.16.15.114 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 100, authtype simple, intvl 1s, length 20
17:01:25.651284 IP 172.16.15.117 > 224.0.0.18: VRRPv2, Advertisement, vrid 62, prio 100, authtype simple, intvl 1s, length 20
17:01:25.894666 IP 172.16.15.115 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:26.071812 IP 172.16.15.116 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:26.167802 IP 172.16.15.114 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 100, authtype simple, intvl 1s, length 20
^C
16 packets captured
16 packets received by filter
0 packets dropped by kernel
root@u115:~# tcpdump -i bond0 -n vrrp
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on bond0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
17:01:16.888645 IP 172.16.15.115 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:17.065600 IP 172.16.15.116 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:17.161665 IP 172.16.15.114 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 100, authtype simple, intvl 1s, length 20
17:01:17.645269 IP 172.16.15.117 > 224.0.0.18: VRRPv2, Advertisement, vrid 62, prio 100, authtype simple, intvl 1s, length 20
17:01:17.888762 IP 172.16.15.115 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:18.065793 IP 172.16.15.116 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:18.161849 IP 172.16.15.114 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 100, authtype simple, intvl 1s, length 20
17:01:18.645419 IP 172.16.15.117 > 224.0.0.18: VRRPv2, Advertisement, vrid 62, prio 100, authtype simple, intvl 1s, length 20
17:01:18.888915 IP 172.16.15.115 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:19.065879 IP 172.16.15.116 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:19.161970 IP 172.16.15.114 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 100, authtype simple, intvl 1s, length 20
17:01:19.645567 IP 172.16.15.117 > 224.0.0.18: VRRPv2, Advertisement, vrid 62, prio 100, authtype simple, intvl 1s, length 20
^C
12 packets captured
12 packets received by filter
0 packets dropped by kernel
root@u116:~# tcpdump -i bond0 -n vrrp
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on bond0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
17:01:09.645450 IP 172.16.15.117 > 224.0.0.18: VRRPv2, Advertisement, vrid 62, prio 100, authtype simple, intvl 1s, length 20
17:01:09.889191 IP 172.16.15.115 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:10.066077 IP 172.16.15.116 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:10.162009 IP 172.16.15.114 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 100, authtype simple, intvl 1s, length 20
17:01:10.645584 IP 172.16.15.117 > 224.0.0.18: VRRPv2, Advertisement, vrid 62, prio 100, authtype simple, intvl 1s, length 20
17:01:10.889310 IP 172.16.15.115 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:11.066199 IP 172.16.15.116 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:11.162125 IP 172.16.15.114 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 100, authtype simple, intvl 1s, length 20
17:01:11.645763 IP 172.16.15.117 > 224.0.0.18: VRRPv2, Advertisement, vrid 62, prio 100, authtype simple, intvl 1s, length 20
17:01:11.889431 IP 172.16.15.115 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:12.066292 IP 172.16.15.116 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
17:01:12.162238 IP 172.16.15.114 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 100, authtype simple, intvl 1s, length 20
17:01:12.645938 IP 172.16.15.117 > 224.0.0.18: VRRPv2, Advertisement, vrid 62, prio 100, authtype simple, intvl 1s, length 20
17:01:12.889574 IP 172.16.15.115 > 224.0.0.18: VRRPv2, Advertisement, vrid 80, prio 90, authtype simple, intvl 1s, length 20
^C
14 packets captured
15 packets received by filter
0 packets dropped by kernel
@a1226207408 再排查下子网之间是否允许主播?
If you do not provide feedback for more than 37 days, we will close the issue and you can either reopen it or submit a new issue.
您超过 37 天未反馈信息,我们将关闭该 issue,如有需求您可以重新打开或者提交新的 issue。