vitess
vitess copied to clipboard
Bug Report: Unique Constraints issues when consolidating shards with "hot" rows
Overview of the Issue
When using MoveTables
to consolidate shards, either by running SwitchTraffic
while moving from 2 to 4 shards (where it affects the _reverse
workflow), or during the normal operation of a MoveTables
command moving from a 2-shard to a 4-shard keyspace, it is possible for vreplication to break because of unique keys, making it near impossible to recover.
Scenario:
- One table, primary key + secondary unique key applied on the MySQL level
- One secondary lookup vindex table using
consistent_lookup_unique
- Topology
- 4-sharded keyspace (
-40
,40-80
,80-c0
,c0-
) as source - 2-sharded keyspace (
-80
,80-
) as target - Same vschema across both. We are simply consolidating shards
- 4-sharded keyspace (
- High volume write scenario on the source shards with the following pattern:
-
writer(-40)
-
insert into table1(uid) values (25);
--> inserts row with valuesid=1, uid=25
- ...
-
delete from table1 where id=1;
-
-
writer(40-80)
-
insert into table1(uid) values (25);
--> inserts row with valuesid=2, uid=25
- ...
-
delete from table1 where id=2;
-
-
writer(-40)
-
insert into table1(uid) values (25);
--> inserts row with valuesid=3, uid=25
- ...
-
delete from table1 where id=3;
-
-
writer(40-80)
-
insert into table1(uid) values (25);
--> inserts row with valuesid=4, uid=25
- ...
-
delete from table1 where id=4;
-
- etc.
-
Because the vstreams for -40
and 40-80
are executing independently from eachother, when consolidating these streams back into -80
, it is very easy for one of those vstreams to fall behind the other, and fail perpetually thereafter. The error we observe is a unique constraint error on the MySQL side.
A workaround for the above scenario could be to run the vstream sessions with SET unique_checks=0;
. This should be a safe operation in this case, because we still have our secondary lookup vindex table enforcing uniqueness across shards at the source side. I'm unclear if it would be considered safe in all unique
constraint scenarios.
Reproduction Steps
- Using the following
vschema
:
{
"sharded": true,
"vindexes": {
"hash": {
"type": "hash"
},
"uid_vdx": {
"type": "consistent_lookup_unique",
"params": {
"from": "uid",
"table": "ks.uid_vdx",
"to": "keyspace_id"
},
"owner": "table1"
},
}
"tables": {
"table1": {
"column_vindexes": [
{
"column": "id",
"name": "hash"
},
{
"column": "uid",
"name": "uid_vdx"
}
]
},
"uid_vdx": {
"columnVindexes": [
{
"column": "uid",
"name": "hash"
}
]
}
}
}
- Using the following schema:
CREATE TABLE table1 (
id int not null,
uid int not null,
primary key (id),
unique key uid(uid)
)
Assume a sequence table exists in an unsharded keyspace to help populate id.
Binary Version
2023-09-26.0a5a570 (v16.0.3)
Operating System and Environment details
PlanetScaleDB k8s environment
Log Fragments
The following vreplication streams exist for workflow ks2.Reshard_reverse:
id=23 on -80/gcp_uscentral1c_1-3344908643: Status: Running. VStream Lag: -1s. Tx time: Fri Oct 13 21:59:30 2023.
id=24 on -80/gcp_uscentral1c_1-3344908643: Status: Error: Duplicate entry '25' for key 'table1.uid' (errno 1062) (sqlstate 23000) during query: insert into table1(id,uid) values (3,25).
id=23 on 80-/gcp_uscentral1c_1-0264185158: Status: Running. VStream Lag: -1s. Tx time: Fri Oct 13 21:59:31 2023.
id=24 on 80-/gcp_uscentral1c_1-0264185158: Status: Running. VStream Lag: -1s. Tx time: Fri Oct 13 21:59:31 2023.