tikv
tikv copied to clipboard
Add update pk/uk flag for ChangeDataEvent
Feature Request
Is your feature request related to a problem? Please describe:
TiCDC pulls the change events from tikv, which can only reflect the change result of a txn in kv layer, but does not retain the execution order of the txn. In the current implementation, there may be correctness issues in updating uk/pk scenarios. For example:
-- Table Schema
CREATE TABLE test (
pk_id INT PRIMARY KEY,
uk_id INT UNIQUE NOT NULL
);
-- state 0
mysql> select * from test;
+-------+-------+
| pk_id | uk_id |
+-------+-------+
| 2 | 1 |
| 3 | 2 |
| 4 | 3 |
+-------+-------+
3 rows in set (0.00 sec)
-- SQL in the upstream:
begin;
update test set uk_id=3+1 where uk_id=3;
update test set uk_id=2+1 where uk_id=2;
update test set uk_id=1+1 where uk_id=1;
insert into test(`pk_id`, `uk_id`) values (1, 1);
commit;
-- state 1
mysql> select * from test;
+-------+-------+
| pk_id | uk_id |
+-------+-------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
+-------+-------+
4 rows in set (0.00 sec)
Synchronizing the above changes out of order will inevitably cause uk conflict issues, and ticdc has no way of knowing the order of changes, so the only solution is
- split update to delete + insert events.
- Sort single transaction so that deletes are executed before inserts.
delete from test where uk_id in (1, ,2, 3);
insert into test values (1,1), (2,2), (3,3), (4,4);
ticdc's sorter module already implements the ability to rank delete before insert, but there is no identifier in the ChangeDataEvent to indicate whether uk/pk has changed.
Describe the feature you'd like:
Ref: https://github.com/pingcap/kvproto/blob/81788387d2d38f24da011809bcd23f87eb9271ff/proto/cdcpb.proto#L73-L97
enum OpType {
UNKNOWN = 0;
PUT = 1;
DELETE = 2;
UPDATEPK=3;
UPDATEPK=4;
}
Describe alternatives you've considered:
MEMORY puller mounter(decode by schema) ---> sink
| |
DISK ----> sorter --->
Although it is easy to determine whether uk/pk changes from the decoded event in mounter module, the sorting operation is for a single transaction, which can cause OOM problems in large transaction scenarios.