Problem in loop replication detection
Case
A <--dtle --> B
- Prepare data on both sides
create schema if not exists a592;
create table a592.a (id int primary key auto_increment);
insert into a592.a values (1);
-
run the 2 jobs
-
exec on side A
set gtid_next = '11111111-1111-1111-1111-111111111111:1';
update a592.a set id = 111 where id = 1;
set gtid_next = 'automatic';
insert into a592.a values (1);
-
The TX is replicated to B (which is normal)
-
The TX is replicated to A again (which it should not), since the OSID is not A's uuid.
[ERROR] client.driver_mgr.dtle: error at exec: driver=dtle @module=dtle.applier.ApplyBinlogEvent err="Error 1062: Duplicate entry '111' for key 'PRIMARY'" gtid=b0f42144-06cd-11e9-93ca-02000aba3e28:74 job=aa2 timestamp=2021-02-25T17:56:21.693+0800
We should check whether a tx has been executed based on target gtid_executed instead of uuid.
- 已知事务的gtid有server_id:gno两部分.
- 当A端执行的TX, 其gtid.server_id不是A的server_id时, 循环识别机制将误判
问题重述:
- 一个TX gtid有server_uuid:gno两部分
- dtle在目标端mysql gtid_executed表上记录TX的server_uuid:gno
- TODO
方案1: gtid_executed记录TX uuid和src MySQL uuid(如若两者不一致).
- 对于源端uuid会发生变化(集群切换、SIP飘移)的情况无效
方案2A: 遇到有osid的TX,执行前通过select @@global.gtid_executed查询
- 性能开销?
- AliRDS支持?
- 已知DBLE不支持
方案2B: 使用MySQL session_track_gtids功能追踪dst mysql的gtid_executed
- 性能开销?
- AliRDS支持?
- 目测DBLE不支持
方案3: 拓展dtle源端和目标端的功能,使其支持单job双向复制。则可以通过 slave thread得知mysql的gtid_executed
目前倾向于方案3。需作较大调整:
- 整合job配置:所有配置写在src端,由src发给dst端
- 调整Extractor/Applier, 可在task中重复启动
- 对于双向复制,在Extractor/Applier间共享一些信息
TODO 考虑双向执行DDL的问题。
方案4: 提供选项, 执行set gtid_next, 使源端目标端gtid一致.
这需要REPLICATION_APPLIER权限(8.0) 或者 SUPER 权限 (5.7)
当gtid_next包含于gtid_executed中时, MySQL会忽略该TX的执行.
问题4.1: set gtid_next vs DDL (implicit commit)
mysql> set gtid_next = 'acd7d195-06cd-11e9-928f-02000aba3e00:1';
Query OK, 0 rows affected (0.01 sec)
mysql> create table a.a1 (id int primary key);
Query OK, 0 rows affected (0.07 sec)
mysql> set gtid_next = 'acd7d195-06cd-11e9-928f-02000aba3e00:2';
Query OK, 0 rows affected (0.00 sec)
mysql> begin;
Query OK, 0 rows affected (0.01 sec)
mysql> create table a.a2 (id int primary key);
ERROR 1778 (HY000): Cannot execute statements with implicit commit inside a transaction when @@SESSION.GTID_NEXT == 'UUID:NUMBER'.
问题4.2: 多job复制不同表
tx
-
insert tb1
-
insert tb2
-
job1先复制tx.tb1
-
job2后复制tx.tb2
则job2复制tx时, 目标端MySQL认为tx已经执行过, 会跳过执行.