docs icon indicating copy to clipboard operation
docs copied to clipboard

Update db-ha.md:keepalived 非抢占模式说明

Open fangpsh opened this issue 1 year ago • 6 comments

  1. 生产环境中,对于DB 的高可用,一般会选择配置为非抢占模式,方便对发生崩溃的机器进行检查,同时也避免网络抖动等导致频繁切换。

  2. CentOS7.9, keepalived 1.3.5 19.el7, 原文配置中,2台的status 均是master,实际测试时,还是会发生切回。一般建议配置都为backup

fangpsh avatar Jul 14 '23 03:07 fangpsh

关于第2点的补充说明: 如果2台机器的priority 相同,state 均为master,设置nopreempt。 做了实验,先启动的主挂,备可选为主,但是原主再启动,会切回。

备节点日志,当原主再次启动时候:

Jul 14 12:19:13 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) Received advert with higher priority 100, ours 100
Jul 14 12:19:13 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) Entering BACKUP STATE
Jul 14 12:19:13 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) removing protocol VIPs.
Jul 14 12:19:48 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) Transition to MASTER STATE
Jul 14 12:19:49 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) Entering MASTER STATE
Jul 14 12:19:49 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) setting protocol VIPs.
Jul 14 12:19:49 sz-node-02 Keepalived_vrrp[4554]: Sending gratuitous ARP on em1 for 192.168.8.1

不过奇怪的是,如果将主和备都停止,先启动备,再启动主,还是会切回: 原主节点日志:

Jul 14 11:24:21 sz-node-01 Keepalived_vrrp[7487]: VRRP_Instance(VI_1) Received advert with lower priority 100, ours 100, forcing new election
Jul 14 11:24:21 sz-node-01 Keepalived_vrrp[7487]: Sending gratuitous ARP on em1 for 192.168.8.1

似乎先启动的主始终是主,不知还是有其他原因。

anyway,更建议将status 都修改为backup,可避免频繁切换,更适合数据库的场景。

fangpsh avatar Jul 14 '23 03:07 fangpsh

@fangpsh 感谢,这个我们内部先验证下

zexi avatar Jul 17 '23 03:07 zexi

今天遇到一台master 异常,但是vip 未释放,然后导致vip 不通,一直无法恢复,即使异常的master 后来自动恢复(压测,iops 打满)。

进入看了下容器的配置,

vrrp_instance VI_1 {
  interface br0

  virtual_router_id 88
  priority 100
  nopreempt

也没指定status,不知道是不是这个影响。

查看容器日志,打印的健康检查均ok。 kill 掉一个keepalived 进程后恢复。 目前配置下,故障自动转移似乎有点问题。

fangpsh avatar Jul 19 '23 06:07 fangpsh

另外发现当vip 挂掉的时候,node节点上报就挂掉了,状态异常,完全依赖vip。

是否可以考虑增加一个haproxy节点,作为master 的切换,node 节点的通信上报,走一层本地的haproxy,不受vip 和当个master 的影响。

参考: https://www.kubesphere.io/zh/docs/v3.3/installing-on-linux/high-availability-configurations/internal-ha-configuration/ image

fangpsh avatar Jul 19 '23 06:07 fangpsh

关于第2点的补充说明: 如果2台机器的priority 相同,state 均为master,设置nopreempt。 做了实验,先启动的主挂,备可选为主,但是原主再启动,会切回。

备节点日志,当原主再次启动时候:

Jul 14 12:19:13 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) Received advert with higher priority 100, ours 100
Jul 14 12:19:13 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) Entering BACKUP STATE
Jul 14 12:19:13 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) removing protocol VIPs.
Jul 14 12:19:48 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) Transition to MASTER STATE
Jul 14 12:19:49 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) Entering MASTER STATE
Jul 14 12:19:49 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) setting protocol VIPs.
Jul 14 12:19:49 sz-node-02 Keepalived_vrrp[4554]: Sending gratuitous ARP on em1 for 192.168.8.1

不过奇怪的是,如果将主和备都停止,先启动备,再启动主,还是会切回: 原主节点日志:

Jul 14 11:24:21 sz-node-01 Keepalived_vrrp[7487]: VRRP_Instance(VI_1) Received advert with lower priority 100, ours 100, forcing new election
Jul 14 11:24:21 sz-node-01 Keepalived_vrrp[7487]: Sending gratuitous ARP on em1 for 192.168.8.1

似乎先启动的主始终是主,不知还是有其他原因。

anyway,更建议将status 都修改为backup,可避免频繁切换,更适合数据库的场景。

@fangpsh 这个要看keepalived配置文件中priority 字段,主节点的值比备节点大,所以,主节点恢复户,数据库的vip会切换到主节点的

hoganlxj avatar Jul 19 '23 06:07 hoganlxj

关于第2点的补充说明: 如果2台机器的priority 相同,state 均为master,设置nopreempt。 做了实验,先启动的主挂,备可选为主,但是原主再启动,会切回。 备节点日志,当原主再次启动时候:

Jul 14 12:19:13 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) Received advert with higher priority 100, ours 100
Jul 14 12:19:13 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) Entering BACKUP STATE
Jul 14 12:19:13 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) removing protocol VIPs.
Jul 14 12:19:48 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) Transition to MASTER STATE
Jul 14 12:19:49 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) Entering MASTER STATE
Jul 14 12:19:49 sz-node-02 Keepalived_vrrp[4554]: VRRP_Instance(VI_1) setting protocol VIPs.
Jul 14 12:19:49 sz-node-02 Keepalived_vrrp[4554]: Sending gratuitous ARP on em1 for 192.168.8.1

不过奇怪的是,如果将主和备都停止,先启动备,再启动主,还是会切回: 原主节点日志:

Jul 14 11:24:21 sz-node-01 Keepalived_vrrp[7487]: VRRP_Instance(VI_1) Received advert with lower priority 100, ours 100, forcing new election
Jul 14 11:24:21 sz-node-01 Keepalived_vrrp[7487]: Sending gratuitous ARP on em1 for 192.168.8.1

似乎先启动的主始终是主,不知还是有其他原因。 anyway,更建议将status 都修改为backup,可避免频繁切换,更适合数据库的场景。

@fangpsh 这个要看keepalived配置文件中priority 字段,主节点的值比备节点大,所以,主节点恢复户,数据库的vip会切换到主节点的

参考DB那个文档的配置,2个节点的priority 是一致(一样)的,见上文日志中,都是100vs100,一个认为higher,一个认为lower,也是奇怪: image

fangpsh avatar Jul 19 '23 10:07 fangpsh

看到官方已经修改了: https://github.com/yunionio/cloudpods/issues/18484

这个PR 先关闭了。

fangpsh avatar May 11 '24 02:05 fangpsh