dble
dble copied to clipboard
当MySQL主备切换后,dble心跳检测状态(RS_CODE)一直为time_out,并且短时间内无法自动恢复
- dble version:dble-3.22.01
- mysql version:8.0.29
- preconditions :两台主机分别部署了两个mysql实例(一主一从) 和 两个dble服务,两个dble服务是通过MySQL高可用VIP方式连接数据库。
- 问题描述:当mysql做了主备切换后(高可用vip发生漂移),dble心跳检测RS_CODE的值一直是time_out状态,持续了很长时间(几十分钟)才自动恢复为ok状态。观察dble日志发现,当dble对后端mysql进行心跳检测出现问题时,dble并没有进行重试,而是直接将心跳状态设置为timeout了。
**其他现象:当mysql主备切换后,mysql高可用vip所在的节点上的dble服务心跳状态是正常的, 而另外一个节点上的dble的心跳状态就变为timeout了。 **
dble.log
2022-06-15 12:53:06.687 INFO [29-frontWorker] (com.actiontech.dble.net.connection.AbstractConnection.closeImmediatelyInner(AbstractConnection.java:159)) - connection id close for reason [quit cmd] with connection FrontendConnection[id = 200 port = 9512 host = 127.0.0.1 local_port = 16131 isManager = true startupTime = 1655268786680 skipCheck = false isFlowControl = false onlyTcpConnect = false]
2022-06-15 12:53:10.898 WARN [TimerScheduler-0] (com.actiontech.dble.backend.heartbeat.MySQLHeartbeat.setTimeout(MySQLHeartbeat.java:245)) - heartbeat to [192.168.107.138:3307] setTimeout, previous status is 1
2022-06-15 12:53:10.983 INFO [DefaultTimer10 thread-1] (com.actiontech.dble.net.connection.BackendConnection.close(BackendConnection.java:125)) - connection id 27 mysqlId 6461 close for reason conn heart timeout
2022-06-15 12:53:10.983 INFO [DefaultTimer10 thread-1] (com.actiontech.dble.net.connection.BackendConnection.close(BackendConnection.java:125)) - connection id 27 mysqlId 6461 close for reason conn heart timeout
2022-06-15 12:53:10.983 INFO [DefaultTimer10 thread-1] (com.actiontech.dble.net.connection.AbstractConnection.closeImmediatelyInner(AbstractConnection.java:159)) - connection id close for reason [conn heart timeout] with connection BackendConnection[id = 27 host = 192.168.107.138 port = 3307 localPort = 13612 mysqlId = 6461 db config = dbInstance[name=instanceM1,disabled=false,maxCon=1000,minCon=20]
2022-06-15 12:53:10.983 INFO [DefaultTimer10 thread-1] (com.actiontech.dble.net.connection.BackendConnection.close(BackendConnection.java:125)) - connection id 30 mysqlId 6455 close for reason conn heart timeout
db.xml
<dble:db xmlns:dble="http://dble.cloud/" version="4.0">
<dbGroup name="dbGroup1" rwSplitMode="0" delayThreshold="-1">
<heartbeat errorRetryCount="3" timeout="3">select 1</heartbeat>
<!--注意:用户必须要有XA_RECOVER_ADMIN权限,否则启动会报错:execute 'XA RECOVER' in dbInstance error!,仅适用于后端MySQL8.0以上版本。-->
<!--说明:192.168.107.138 是MySQL的高可用VIP-->
<dbInstance name="instanceM1" url="192.168.107.138:3307" user="topca5" password="xxxxx" maxCon="1000" minCon="20"
primary="true">
<property name="testOnCreate">false</property>
<property name="testOnBorrow">false</property>
<property name="testOnReturn">false</property>
<property name="testWhileIdle">true</property>
<property name="connectionTimeout">30000</property>
<property name="connectionHeartbeatTimeout">20</property>
<property name="timeBetweenEvictionRunsMillis">30000</property>
<property name="idleTimeout">600000</property>
<property name="heartbeatPeriodMillis">3000</property>
<property name="evictorShutdownTimeoutMillis">10000</property>
</dbInstance>
</dbGroup>
</dble:db>
please refer:https://opensource.actionsky.com/20201204-mysql/
大佬,dble内部有没有类似于keepalive的功能啊,如果改系统级影响就比较大了。
大佬,dble内部有没有类似于keepalive的功能啊,如果改系统级影响就比较大了。
we will fix this issue next version.
fix:https://github.com/actiontech/dble/pull/3344#issue-1332935264