gh-ost
gh-ost copied to clipboard
fix data-integrity problem when add unique key
Hi, Add a PR for #485 #477 #167 .
Insert syntax in gh-ost will be rewritten with insert ignore into
(row-copy) and replace into
(binlog apply goroutine) . if the data in the original table or data after apply would cause conflicts, means does not meet the conditions for adding a unique key, data will be lost in current version. This PR can detect data conflicts and exit, including conflicts in the original table data, and the data after binlog apply. PTAL.
The error looks like the following and is very easy to reproduce:
- original conflict
2021-01-08 16:58:15 ERROR Error 1062: Duplicate entry 'cenkore' for key 'name'
- apply confict
2021-01-08 16:59:37 ERROR Error 1062: Duplicate entry 'cenkore' for key 'name'; query= insert /* gh-ost
adetestdb
._testuk_gho
*/ intoadetestdb
._testuk_gho
(id
,name
,location
,inserttime
) values (?, ?, ?, ?) ; args=[3 cenkore cn 2021-01-08 16:59:30]
thanks.
Adding a UNIQUE KEY
while table is being written to, is by definition a race condition, unless you can guarantee the application never writes duplicates.
Can you please clarify what is the exact use case (possibly with a test) that this PR solves?
Thank you!
Thanks for your quick reply. @shlomi-noach
In this PR, if duplicates are on the newly added unique key , it will fail instead of ignore.
Background
Gh-ost contains two threads to process the data of the published table, one thread copies the data of the original table, using Insert ignore into, and the other thread is used to process incremental data, by parsing and applying the binlog, using replace into; Same data is likely to be processed twice because of two threads are processed concurrently. In order to avoid conflicts leading to failure, insert ignore into and replace into are designed. However, when adding a unique key will cause the actual duplicate data on the newly added unique key would be ignored, because duplicate data will be ignored or replaced. This will cause data loss. In extreme cases, if you add a new column and add a unique key to this column at the same time, then there will be only one record left after the table is published. It may cause serious problems.
Proposal
Based on the above background, we can do special processing for the scene. For data copy, only copy the data that primary key value is not in _xx_gho, which can be rewritten as insert into _xx_gho select * from xx a left join _xx_gho b on a.pk = b.pk where b.pk is null; for binlog apply, You can rewrite replace into as delete + insert.
Rationale
The data copy, only the primary key value is not in the gho table is copied, but we don’t need use ignore
, so we only ignore the conflict of the primary key value, but the conflict on the newly added unique key would return error; and the incremental part, assuming that the data has not been copied, delete is a dry-run. If there is a conflict with the newly added unique key, it will fail, but replace cannot do this. If the data has been copied, delete the primary key value and execute insert again. Similarly, if the data conflicts with the newly added unique key, it will also fail.
Thank you!
Add 2 test cases
original table definition: create table uk(id int, name varchar(10),inser_time datetime, primary key(id))
alter: add unique key(name)
Test case 1:
prepare: insert into uk values(1,'cenkore',now())
execute: ./gh-ost --host=10.0.0.1 --database=adetestdb --table=uk --verbose --alter="add unique key(name)" --concurrent-rowcount --conf="/usr/local/gh-ost.cnf" --port=66666 --nice-ratio=0.1 --execute --initially-drop-ghost-table --initially-drop-old-table --postpone-cut-over-flag-file="/tmp/uk.postpone"
We can insert a duplicate row before cutover table by popstpone flag, insert sql: insert into uk values(2,'cenkore',now())
result:
-
current version
: handle success, BUT loss one row which id is 1, new table value as follows:+----+---------+---------------------+
| id | name | inser_time |
+----+---------+---------------------+
| 2 | cenkore | 2021-01-21 11:06:45 |
+----+---------+---------------------+ -
this pr
: handle failure with error:2021-01-21 11:59:31 FATAL Error 1062: Duplicate entry 'cenkore' for key 'name'; query=
insert /* gh-ostadetestdb
._uk_gho
*/ into
adetestdb
._uk_gho
(id
,name
,inser_time
)
values
(?, ?, ?)
; args=[2 cenkore 2021-01-21 11:58:31]
Test case 2:
prepare: insert into uk values (1,'cenkore',now()),(2,'cenkore',now())
execute: ./gh-ost --host=10.0.0.1 --database=adetestdb --table=uk --verbose --alter="add unique key(name)" --concurrent-rowcount --conf="/usr/local/gh-ost.cnf" --port=66666 --nice-ratio=0.1 --execute --initially-drop-ghost-table --initially-drop-old-table
result:
-
current version
: handle success, BUT loss one row which id is 2, new table value as follows:+----+---------+---------------------+
| id | name | inser_time |
+----+---------+---------------------+
| 1 | cenkore | 2021-01-21 11:51:01 |
+----+---------+---------------------+ -
this pr
: handle failure with error:2021-01-21 11:56:38 FATAL Error 1062: Duplicate entry 'cenkore' for key 'name'
@cenkore thank you for the PR and for the explanation.
I've been thinking, possibly we don't need to change the insert ignore
. We can keep that as it is, without the left join
. It may be enough to replace the replace into
with (delete where pk=?
+ insert into
). I think the logic is still correct.
I'm not sure yet. Can you see a problem with that?
Oh, strike that. The trivial case when the table is pre-populated with conflicting values + zero traffic in the binary logs is enough to prove my thinking was wrong.
@cenkore would you mind submitting an identical PR on https://github.com/openark/gh-ost/, where I can take a closer look please?
@shlomi-noach ok, I will submit later. Thank you.
@shlomi-noach I submitted to openark/gh-ost
, but I can’t guarantee the accuracy of my operation, because it's difficult for me which merge is between two forked sibling projects, and your project is ahead of github/gh-ost
. :D
@cenkore thank you so much for the effort! I will double check.
When will this pr be merged