dtle icon indicating copy to clipboard operation
dtle copied to clipboard

Problem in loop replication detection

Open c0494133d4 opened this issue 4 years ago • 6 comments

Case

A <--dtle --> B

  1. Prepare data on both sides
create schema if not exists a592;
create table a592.a (id int primary key auto_increment);
insert into a592.a values (1);
  1. run the 2 jobs

  2. exec on side A

set gtid_next = '11111111-1111-1111-1111-111111111111:1';
update a592.a set id = 111 where id = 1;
set gtid_next = 'automatic';
insert into a592.a values (1);
  1. The TX is replicated to B (which is normal)

  2. The TX is replicated to A again (which it should not), since the OSID is not A's uuid.

   [ERROR] client.driver_mgr.dtle: error at exec: driver=dtle @module=dtle.applier.ApplyBinlogEvent err="Error 1062: Duplicate entry '111' for key 'PRIMARY'" gtid=b0f42144-06cd-11e9-93ca-02000aba3e28:74 job=aa2 timestamp=2021-02-25T17:56:21.693+0800

We should check whether a tx has been executed based on target gtid_executed instead of uuid.

c0494133d4 avatar Feb 25 '21 07:02 c0494133d4

  • 已知事务的gtid有server_id:gno两部分.
  • 当A端执行的TX, 其gtid.server_id不是A的server_id时, 循环识别机制将误判

ghost avatar Feb 25 '21 10:02 ghost

问题重述:

  • 一个TX gtid有server_uuid:gno两部分
  • dtle在目标端mysql gtid_executed表上记录TX的server_uuid:gno
  • TODO

ghost avatar Jun 28 '22 09:06 ghost

方案1: gtid_executed记录TX uuid和src MySQL uuid(如若两者不一致).

  • 对于源端uuid会发生变化(集群切换、SIP飘移)的情况无效

方案2A: 遇到有osid的TX,执行前通过select @@global.gtid_executed查询

  • 性能开销?
  • AliRDS支持?
  • 已知DBLE不支持

方案2B: 使用MySQL session_track_gtids功能追踪dst mysql的gtid_executed

  • 性能开销?
  • AliRDS支持?
  • 目测DBLE不支持

方案3: 拓展dtle源端和目标端的功能,使其支持单job双向复制。则可以通过 slave thread得知mysql的gtid_executed

ghost avatar Jun 28 '22 09:06 ghost

目前倾向于方案3。需作较大调整:

  1. 整合job配置:所有配置写在src端,由src发给dst端
  2. 调整Extractor/Applier, 可在task中重复启动
  3. 对于双向复制,在Extractor/Applier间共享一些信息

TODO 考虑双向执行DDL的问题。

ghost avatar Jul 01 '22 08:07 ghost

方案4: 提供选项, 执行set gtid_next, 使源端目标端gtid一致.

这需要REPLICATION_APPLIER权限(8.0) 或者 SUPER 权限 (5.7)

当gtid_next包含于gtid_executed中时, MySQL会忽略该TX的执行.

ghost avatar Sep 09 '22 05:09 ghost

问题4.1: set gtid_next vs DDL (implicit commit)

mysql> set gtid_next = 'acd7d195-06cd-11e9-928f-02000aba3e00:1';
Query OK, 0 rows affected (0.01 sec)

mysql> create table a.a1 (id int primary key);
Query OK, 0 rows affected (0.07 sec)

mysql> set gtid_next = 'acd7d195-06cd-11e9-928f-02000aba3e00:2';
Query OK, 0 rows affected (0.00 sec)

mysql> begin;
Query OK, 0 rows affected (0.01 sec)

mysql> create table a.a2 (id int primary key);
ERROR 1778 (HY000): Cannot execute statements with implicit commit inside a transaction when @@SESSION.GTID_NEXT == 'UUID:NUMBER'.

问题4.2: 多job复制不同表

tx

  • insert tb1

  • insert tb2

  • job1先复制tx.tb1

  • job2后复制tx.tb2

则job2复制tx时, 目标端MySQL认为tx已经执行过, 会跳过执行.

ghost avatar Sep 13 '22 07:09 ghost