gh-ost icon indicating copy to clipboard operation
gh-ost copied to clipboard

fix data-integrity problem when add unique key

Open cenkore opened this issue 4 years ago • 10 comments

Hi, Add a PR for #485 #477 #167 .

Insert syntax in gh-ost will be rewritten with insert ignore into (row-copy) and replace into(binlog apply goroutine) . if the data in the original table or data after apply would cause conflicts, means does not meet the conditions for adding a unique key, data will be lost in current version. This PR can detect data conflicts and exit, including conflicts in the original table data, and the data after binlog apply. PTAL.

The error looks like the following and is very easy to reproduce:

  • original conflict

2021-01-08 16:58:15 ERROR Error 1062: Duplicate entry 'cenkore' for key 'name'

  • apply confict

2021-01-08 16:59:37 ERROR Error 1062: Duplicate entry 'cenkore' for key 'name'; query= insert /* gh-ost adetestdb._testuk_gho */ into adetestdb._testuk_gho (id, name, location, inserttime) values (?, ?, ?, ?) ; args=[3 cenkore cn 2021-01-08 16:59:30]

thanks.

cenkore avatar Jan 08 '21 09:01 cenkore

Adding a UNIQUE KEY while table is being written to, is by definition a race condition, unless you can guarantee the application never writes duplicates.

Can you please clarify what is the exact use case (possibly with a test) that this PR solves?

Thank you!

shlomi-noach avatar Jan 08 '21 10:01 shlomi-noach

Thanks for your quick reply. @shlomi-noach

In this PR, if duplicates are on the newly added unique key , it will fail instead of ignore.

Background

Gh-ost contains two threads to process the data of the published table, one thread copies the data of the original table, using Insert ignore into, and the other thread is used to process incremental data, by parsing and applying the binlog, using replace into; Same data is likely to be processed twice because of two threads are processed concurrently. In order to avoid conflicts leading to failure, insert ignore into and replace into are designed. However, when adding a unique key will cause the actual duplicate data on the newly added unique key would be ignored, because duplicate data will be ignored or replaced. This will cause data loss. In extreme cases, if you add a new column and add a unique key to this column at the same time, then there will be only one record left after the table is published. It may cause serious problems.

Proposal

Based on the above background, we can do special processing for the scene. For data copy, only copy the data that primary key value is not in _xx_gho, which can be rewritten as insert into _xx_gho select * from xx a left join _xx_gho b on a.pk = b.pk where b.pk is null; for binlog apply, You can rewrite replace into as delete + insert.

Rationale

The data copy, only the primary key value is not in the gho table is copied, but we don’t need use ignore , so we only ignore the conflict of the primary key value, but the conflict on the newly added unique key would return error; and the incremental part, assuming that the data has not been copied, delete is a dry-run. If there is a conflict with the newly added unique key, it will fail, but replace cannot do this. If the data has been copied, delete the primary key value and execute insert again. Similarly, if the data conflicts with the newly added unique key, it will also fail.

Thank you!

cenkore avatar Jan 08 '21 15:01 cenkore

Add 2 test cases

original table definition: create table uk(id int, name varchar(10),inser_time datetime, primary key(id)) alter: add unique key(name)

Test case 1:

prepare: insert into uk values(1,'cenkore',now())

execute: ./gh-ost --host=10.0.0.1 --database=adetestdb --table=uk --verbose --alter="add unique key(name)" --concurrent-rowcount --conf="/usr/local/gh-ost.cnf" --port=66666 --nice-ratio=0.1 --execute --initially-drop-ghost-table --initially-drop-old-table --postpone-cut-over-flag-file="/tmp/uk.postpone"

We can insert a duplicate row before cutover table by popstpone flag, insert sql: insert into uk values(2,'cenkore',now())

result:

  • current version: handle success, BUT loss one row which id is 1, new table value as follows:

    +----+---------+---------------------+
    | id | name | inser_time |
    +----+---------+---------------------+
    | 2 | cenkore | 2021-01-21 11:06:45 |
    +----+---------+---------------------+

  • this pr : handle failure with error:

    2021-01-21 11:59:31 FATAL Error 1062: Duplicate entry 'cenkore' for key 'name'; query=
    insert /* gh-ost adetestdb._uk_gho */ into
    adetestdb._uk_gho
    (id, name, inser_time)
    values
    (?, ?, ?)
    ; args=[2 cenkore 2021-01-21 11:58:31]

Test case 2:

prepare: insert into uk values (1,'cenkore',now()),(2,'cenkore',now())

execute: ./gh-ost --host=10.0.0.1 --database=adetestdb --table=uk --verbose --alter="add unique key(name)" --concurrent-rowcount --conf="/usr/local/gh-ost.cnf" --port=66666 --nice-ratio=0.1 --execute --initially-drop-ghost-table --initially-drop-old-table

result:

  • current version: handle success, BUT loss one row which id is 2, new table value as follows:

    +----+---------+---------------------+
    | id | name | inser_time |
    +----+---------+---------------------+
    | 1 | cenkore | 2021-01-21 11:51:01 |
    +----+---------+---------------------+

  • this pr : handle failure with error:

    2021-01-21 11:56:38 FATAL Error 1062: Duplicate entry 'cenkore' for key 'name'

cenkore avatar Jan 21 '21 04:01 cenkore

@cenkore thank you for the PR and for the explanation.

I've been thinking, possibly we don't need to change the insert ignore. We can keep that as it is, without the left join. It may be enough to replace the replace into with (delete where pk=? + insert into). I think the logic is still correct. I'm not sure yet. Can you see a problem with that?

shlomi-noach avatar Feb 18 '21 07:02 shlomi-noach

Oh, strike that. The trivial case when the table is pre-populated with conflicting values + zero traffic in the binary logs is enough to prove my thinking was wrong.

shlomi-noach avatar Feb 18 '21 07:02 shlomi-noach

@cenkore would you mind submitting an identical PR on https://github.com/openark/gh-ost/, where I can take a closer look please?

shlomi-noach avatar Feb 21 '21 11:02 shlomi-noach

@shlomi-noach ok, I will submit later. Thank you.

cenkore avatar Feb 22 '21 11:02 cenkore

@shlomi-noach I submitted to openark/gh-ost, but I can’t guarantee the accuracy of my operation, because it's difficult for me which merge is between two forked sibling projects, and your project is ahead of github/gh-ost. :D

cenkore avatar Feb 23 '21 05:02 cenkore

@cenkore thank you so much for the effort! I will double check.

shlomi-noach avatar Feb 23 '21 06:02 shlomi-noach

When will this pr be merged

jiangnanora avatar Sep 16 '21 03:09 jiangnanora