phxsql icon indicating copy to clipboard operation
phxsql copied to clipboard

【phxsql异常】 max gtid 2e0904aa-5727-11e7-9080-005056865a78:23413683, not match 导致一节点数据不一致

Open loveleon opened this issue 7 years ago • 0 comments

问题描述: phxsql节点a,b,c。假如b节点是master节点,在b节点中写入数据,c可以同步,a节点不能同步。 查看:phxbinlogsvr.ERROR日志,发现有大量(间隔时间5分钟左右):

E0806 17:35:42.012189 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411723, not match E0806 17:35:48.012066 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411729, not match E0806 17:35:54.012143 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411735, not match E0806 17:35:59.001401 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411740, not match E0806 17:36:00.012512 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411741, not match E0806 17:36:06.012331 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411747, not match E0806 17:36:11.001451 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411752, not match E0806 17:36:12.012786 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411766, not match E0806 17:36:18.011224 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411772, not match E0806 17:36:24.013833 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411778, not match E0806 17:36:30.012259 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411784, not match E0806 17:36:36.012701 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411790, not match E0806 17:36:42.013140 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411797, not match E0806 17:36:48.012652 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411822, not match E0806 17:36:54.012717 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411828, not match E0806 17:37:00.013926 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411840, not match E0806 17:37:06.015473 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411846, not match E0806 17:37:12.013866 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411852, not match E0806 17:37:18.011512 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411858, not match E0806 17:37:24.011888 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411864, not match E0806 17:37:30.012038 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411870, not match E0806 17:37:36.011889 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411876, not match E0806 17:37:42.013588 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411887, not match E0806 17:37:48.015415 11173 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23411893, not match

所以,导致a节点不管在重启a节点上进程之后,依然不能同步。原因分析:因为a节点落后b和c节点太久了,binlog文件已经被清除了。所以,即便a节点重启后,依然不能同步。请问,是否是这原因? 此外,查看**更早时间**的phxbinlogsvr.ERROR文件,发现如下信息: E0806 17:41:09.218830 14898 phx_glog.cpp:78] ERR(0): PN8phxpaxos9SystemVSME::AddNodeIDList No need to add, i already have membership info.

E0806 17:41:10.240186 14898 phx_glog.cpp:78] RealCheckGTID check gtid gtid 2e0904aa-5727-11e7-9080-005056865a78:23412091 not exist in file, data size 14562 E0806 17:41:10.275108 16200 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.178, master ip 192.168.0.179 version 6603777 expiret ime 1533548498 E0806 17:41:10.275140 16200 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.179, master ip 192.168.0.179 version 6603778 expiret ime 1533548486 E0806 17:41:10.275152 16200 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.180, master ip 192.168.0.179 version 6603778 expiret ime 1533548487 E0806 17:41:10.298365 16202 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.178, master ip 192.168.0.179 version 6603777 expiret ime 1533548498 E0806 17:41:10.298398 16202 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.180, master ip 192.168.0.179 version 6603778 expiret ime 1533548487 E0806 17:41:10.576943 16241 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.178, master ip 192.168.0.179 version 6603777 expiret ... ... ... E0806 17:41:10.878366 16212 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.178, master ip 192.168.0.179 version 6603777 expiret ime 1533548498 E0806 17:41:10.889408 16212 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.179, master ip 192.168.0.179 version 6603778 expiret ime 1533548486 E0806 17:41:10.901698 16213 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.178, master ip 192.168.0.179 version 6603777 expiret ime 1533548498 E0806 17:41:10.901787 16213 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.180, master ip 192.168.0.179 version 6603778 expiret ime 1533548487 E0806 17:41:11.000386 16196 phx_glog.cpp:78] ERR: mysql_real_connect fail, Can't connect to local MySQL server through socket '/phxsql/data/ percona.workspace/tmp/percona.sock' (2) E0806 17:41:11.000459 16196 phx_glog.cpp:78] Process check master init fail -2600 E0806 17:41:11.000598 16196 phx_glog.cpp:78] ERR: mysql_real_connect fail, Can't connect to MySQL server on '192.168.0.178' (111) E0806 17:41:11.000629 16196 phx_glog.cpp:78] CheckAdminUser new user root not exist in mysql, wait E0806 17:41:11.000684 16196 phx_glog.cpp:78] ERR: mysql_real_connect fail, Can't connect to MySQL server on '192.168.0.178' (111) E0806 17:41:11.000710 16196 phx_glog.cpp:78] CheckReplicaUser new user replica not exist in mysql, wait E0806 17:41:11.190948 16245 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.178, master ip 192.168.0.179 version 6603777 expiret ime 1533548498 E0806 17:41:11.190984 16245 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.180, master ip 192.168.0.179 version 6603778 expiret ... E0806 17:41:11.806953 16259 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.180, master ip 192.168.0.179 version 6603778 expiret ime 1533548487 E0806 17:41:12.000180 16196 phx_glog.cpp:78] ERR: mysql_real_connect fail, Can't connect to local MySQL server through socket '/phxsql/data/ percona.workspace/tmp/percona.sock' (2) E0806 17:41:12.000265 16196 phx_glog.cpp:78] Process check master init fail -2600 E0806 17:41:12.000357 16196 phx_glog.cpp:78] ERR: mysql_real_connect fail, Can't connect to MySQL server on '192.168.0.178' (111) E0806 17:41:12.000399 16196 phx_glog.cpp:78] CheckAdminUser new user root not exist in mysql, wait E0806 17:41:12.000452 16196 phx_glog.cpp:78] ERR: mysql_real_connect fail, Can't connect to MySQL server on '192.168.0.178' (111) E0806 17:41:12.000560 16196 phx_glog.cpp:78] CheckReplicaUser new user replica not exist in mysql, wait E0806 17:41:12.096520 16283 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.178, master ip 192.168.0.179 version 6603779 expiret ime 1533548492 E0806 17:41:12.096577 16283 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.179, master ip 192.168.0.179 version 6603779 expiret ime 1533548491 ... E0806 17:41:12.711338 16256 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.180, master ip 192.168.0.179 version 6603779 expiret ime 1533548496 E0806 17:41:13.000253 16196 phx_glog.cpp:78] ERR: mysql_real_connect fail, Can't connect to local MySQL server through socket '/phxsql/data/ percona.workspace/tmp/percona.sock' (2) E0806 17:41:13.000296 16196 phx_glog.cpp:78] Process check master init fail -2600 E0806 17:41:13.000401 16196 phx_glog.cpp:78] ERR: mysql_real_connect fail, Can't connect to MySQL server on '192.168.0.178' (111) E0806 17:41:13.000502 16196 phx_glog.cpp:78] CheckAdminUser new user root not exist in mysql, wait E0806 17:41:13.000628 16196 phx_glog.cpp:78] ERR: mysql_real_connect fail, Can't connect to MySQL server on '192.168.0.178' (111) E0806 17:41:13.000685 16196 phx_glog.cpp:78] CheckReplicaUser new user replica not exist in mysql, wait E0806 17:41:13.001790 16228 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.178, master ip 192.168.0.179 version 6603779 expiret ime 1533548492 。。。 。。。 E0806 17:41:13.552361 16272 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.178, master ip 192.168.0.179 version 6603779 expiret ime 1533548492 E0806 17:41:13.552440 16272 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.179, master ip 192.168.0.179 version 6603779 expiret ime 1533548491 E0806 17:41:13.556404 16191 phx_glog.cpp:78] ERR(0): PN8phxpaxos7LearnerE::OnSendNowInstanceID Lag msg, skip E0806 17:41:13.556565 16191 phx_glog.cpp:78] ERR(0): PN8phxpaxos7LearnerE::OnSendNowInstanceID Lag msg, skip E0806 17:41:13.604995 16273 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.178, master ip 192.168.0.179 version 6603779 expiret ime 1533548492 E0806 17:41:13.605039 16273 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.179, master ip 192.168.0.179 version 6603779 expiret ime 1533548491 。。。 。。。 E0806 17:41:19.938992 16277 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.179, master ip 192.168.0.179 version 6603780 expiret ime 1533548496 E0806 17:41:19.945438 16278 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.178, master ip 192.168.0.179 version 6603780 expiret ime 1533548504 E0806 17:41:19.945498 16278 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.180, master ip 192.168.0.179 version 6603780 expiret ime 1533548504 E0806 17:41:20.011579 16196 phx_glog.cpp:78] max gtid 2e0904aa-5727-11e7-9080-005056865a78:23412091, not match E0806 17:41:20.089151 16351 phx_glog.cpp:78] DealWithQuery field count empty, return E0806 17:41:20.089488 16351 phx_glog.cpp:78] DealWithQuery field count empty, return E0806 17:41:20.202404 16192 phx_glog.cpp:78] STATUS(0): PN8phxpaxos7CleanerE::run sleep a while, max deleted instanceid 23937992 checkpoint instance id 24937993 now instanceid 24958116 E0806 17:41:20.209141 16351 phx_glog.cpp:78] DealWithQuery field count empty, return E0806 17:41:20.240154 16223 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.178, master ip 192.168.0.179 version 6603780 expiret ime 1533548504 E0806 17:41:20.240188 16223 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.180, master ip 192.168.0.179 version 6603780 expiret ime 1533548504 E0806 17:41:20.246922 16224 phx_glog.cpp:78] GetGlobalMaster get data from ip 192.168.0.178, master ip 192.168.0.179 version 6603780 expiret ime 1533548504

综上“最近”一份phxbinlogsvr.ERROR日志和前一天的phxbinlogsvr.ERROR日志信息: (猜测分析): 问题max gtid not match所致,所以,重库不会自动同步了。产生这个not match是因为历史binlog文件被删除了,所以这个节点因为某些?原因掉线很久

loveleon avatar Aug 06 '18 10:08 loveleon