gh-ost
gh-ost copied to clipboard
When the _xxxx_gho table is queried, data may be lost.
gh-ost Refer to this document for switching order :https://github.com/github/gh-ost/issues/82
Suppose now operating DDL on table a
, three temporary tables _a_del
, _a_gho
, and_a_ghc
will be generated.
rename operation is: rename a
to _a_del
, _a_gho
to a
,
lock table statement is: lock table a
write, _a_del
write
According to the order of acquiring the lock:
first acquire the
_a_del
table lock, then_a_gho
table lock , and finally thea
table lock.
Thank you!
There seems to be a corruption in the original comment. On my mobile GitHub app, I can actually see what the question is. Copy+pasting here (as is, including typos):
If there is a query transaction on the _a_gho table lock that causes the acquisition if the lock to be delayed, other threads may be before rename First obtain the lock of table a, write the data in table, and then perform table rename operation, at this time the data is lost
This situation does not happen. First, gh-ost locks the original table, then, it waits until the _gho
table has received all the writes from the original table up to the lock. From that point, there can be no mor ewrites on _gho
table. gh-ost
then proceeds to lock _gho
table.
How to repeat :
create table a (id int);
create table _a_del (id int);
create table _a_gho (id int);
session 1 | session 2 | session 3 | session 4 |
---|---|---|---|
lock table a write, _a_del write; | |||
insert into a values (1); | |||
begin;select * from _a_gho; | |||
rename table a to _a_del,_a_gho to a; | |||
drop table _a_del;unlock tables; | |||
commit; |
mysql> select * from a; Empty set (0.00 sec)
mysql> select * from _a_del; +------+ | id | +------+ | 1 | +------+ 1 row in set (0.00 sec)
https://dev.mysql.com/doc/refman/8.0/en/metadata-locking.html
DDL statements, LOCK TABLES, and other similar statements try to reduce the number of possible deadlocks between concurrent DDL statements by acquiring locks on explicitly named tables in name order.
Sorry for accidentally closed, reopen.
@wukongHH
session 3: rename table a to _a_del,_a_gho to a;
when does this statement run? In gh-ost
, this statement only runs after _gho
table is known to have received the insert
from a
. Did you mimic this behavior?
DDL statements, LOCK TABLES, and other similar statements try to reduce the number of possible deadlocks between concurrent DDL statements by acquiring locks on explicitly named tables in name order.
I'm not sure what this quote means to this issue's context. Can you please clarify?
@wukongHH
session 3: rename table a to _a_del,_a_gho to a;
when does this statement run? In
gh-ost
, this statement only runs after_gho
table is known to have received theinsert
froma
. Did you mimic this behavior?
Yes. Rename statement happened after all data sync from table a
.
Here the insert operation is from APP SERVERS, this operation comes after gh-ost have got the write lock, normally this insert should be blocked until rename finished.
In our case, there are other sessions acces to the _gho
table (got one transaction start before unlock tables
, and not finished when unlock), because rename table a to _a_del,_a_gho to a
acquiring locks on explicitly named tables in name order, when there got a trx on the _gho
table, rename
is waiting for the trx on _gho
table to commit before acquire table a
lock , this cause the insert
from APP SERVERS run before rename finished(which should blocked after rename ok), data lost happend here.
Session1 | session2 | Session3 | Session4 | |
---|---|---|---|---|
create table a (id int); create table _a_del(id int); create table _a_gho(id int); lock table a write,_a_del write; #OK |
||||
insert into a set id=1; #blocked | ||||
rename table a to _a_del, _a_gho to a; #blocked #acquiring locks order: _a_del, _a_gho, a #waiting lock of: _a_del |
||||
begin; select * from _a_gho; |
||||
drop table _a_del; | ||||
#get table lock: _a_del #waiting lock of: _a_gho ,cause session 4 trx not commited |
||||
unlock tables; | ||||
—OK | ||||
#now table a with no WRITE lock #session 2 insert may insert to table a |
||||
commit; | ||||
—OK | ||||
select * from a; Empty set (0.00 sec) |
||||
select * from _a_del; +------+ | id | +------+ | 1 | +------+ 1 row in set (0.01 sec) |
DDL statements, LOCK TABLES, and other similar statements try to reduce the number of possible deadlocks between concurrent DDL statements by acquiring locks on explicitly named tables in name order.
I'm not sure what this quote means to this issue's context. Can you please clarify?
This is the acquiring locks order of DDL statement from MySQL manual.
Thank you, I will look into it.
I'm meanwhile curious: who would be reading from _a_gho
? This table is unknown to anyone outside gh-ost
itself or the engineer that runs the migration.
I'm meanwhile curious: who would be reading from
_a_gho
? This table is unknown to anyone outsidegh-ost
itself or the engineer that runs the migration.
--
DB manage program such as table checksum/data replication program/migration table check may access all tables in MySQL.
Table _a_gho
is unkown to user, but known for DBAs ,we think this read comes from DBA's program:)
We met a case two days ago which cause a table lost one row after ALTER table through gh-ost.
This is the first time we meet this case, after check ,we think there maybe a risk of data loss if someone access the _gho table(get lock on _a_gho
table before unlock tables
command) when cut-over。
Thanks for you reply ,waiting for your review result.
Thank you, I will look into it.
I'm meanwhile curious: who would be reading from
_a_gho
? This table is unknown to anyone outsidegh-ost
itself or the engineer that runs the migration.
hi Shlomi-noach , is there anly update for the case please ^_^?
Sorry to bother, but we look forward fix this ASAP for there is a risk of lossing data, we run gh-ost doing DDL every day。
A fix for the case above we think is just change the _a_ghc
and _a_gho
to another name which name order is after table a
, so the rename
waiting to get the lock , there would no time "gap" for DML to get lock and run between unlock tables
and the rename
。
Thanks.
Hi,
we look forward fix this ASAP
I'm not sure I will be able to provide a fix in the next few days.
A fix for the case above we think is just change the _a_ghc and _a_gho to another name which name order is after table a, so the rename waiting to get the lock , there would no time "gap" for DML to get lock and run between unlock tables and the rename 。
I'm not sure I understand exactly what you mean. Can you please paste the code change? Better yet, if you think you have the correct solution, could you create a pull request?
I'm not the owner of https://github.com/github/gh-ost and will not be able to merge any changes here. But I may merge such changes in https://github.com/openark/gh-ost
I can confirm I get the same behavior as yours, but something's strange here:
At first I thought this might be related to the fact your table doesn;t have a PRIMARY KEY
; I added a PRIMARY KEY
and I'm still getting your results.
I then tries a different table structure, and I got different results, which are good. See below.
Can you please clarify what is the output of your:
select @@tx_isolation;
select @@sql_mode;
select @@version;
I'm using the following:
select @@sql_mode;
+-------------------------------------------------------------------------------------------------------------------------------------------+
| @@sql_mode |
+-------------------------------------------------------------------------------------------------------------------------------------------+
| ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION |
+-------------------------------------------------------------------------------------------------------------------------------------------+
select @@tx_isolation;
+-----------------+
| @@tx_isolation |
+-----------------+
| REPEATABLE-READ |
+-----------------+
select @@version;
+-----------+
| @@version |
+-----------+
| 5.7.30 |
+-----------+
Here's a different table definition which works correctly:
CREATE TABLE `t` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`n` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=latin1;
create table tdel like t;
create table tgho like t;
session1 | session2 | session3 | session4 |
---|---|---|---|
lock tables t write, tdel write; | |||
OK | |||
insert into t values(null,13);#blocked | |||
#blocked | rename table t to tdel, tgho to t; #blocked |
||
#blocked | #blocked | begin; select * from tgho; |
|
#blocked | #blocked | OK, transaction open | |
drop table tdel; | #blocked | #blocked | transaction open |
OK | #blocked | #blocked | transaction open |
unlock tables; | #blocked | #blocked | transaction open |
OK | #blocked | #blocked | transaction open |
#blocked | #blocked | commit; | |
OK | OK | OK |
select * from t;
+----+------+
| id | n |
+----+------+
| 1 | 13 |
+----+------+
elect * from tdel;
Empty set (0.01 sec)
Still looking into why it behaves different than yours, and correctly so.
it's not the AUTO_INCREMENT
. Still trying to understand what.
Sorry, I did not execute the rename
and the select * from tgho
in the same order as you did.
I can confirm I get the same behavior as yours, but something's strange here:
At first I thought this might be related to the fact your table doesn;t have a
PRIMARY KEY
; I added aPRIMARY KEY
and I'm still getting your results.I then tries a different table structure, and I got different results, which are good. See below.
Can you please clarify what is the output of your:
select @@tx_isolation; select @@sql_mode; select @@version;
I'm using the following:
select @@sql_mode; +-------------------------------------------------------------------------------------------------------------------------------------------+ | @@sql_mode | +-------------------------------------------------------------------------------------------------------------------------------------------+ | ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION | +-------------------------------------------------------------------------------------------------------------------------------------------+ select @@tx_isolation; +-----------------+ | @@tx_isolation | +-----------------+ | REPEATABLE-READ | +-----------------+ select @@version; +-----------+ | @@version | +-----------+ | 5.7.30 | +-----------+
Here's a different table definition which works correctly:
CREATE TABLE `t` ( `id` int(11) NOT NULL AUTO_INCREMENT, `n` int(11) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=latin1; create table tdel like t; create table tgho like t;
session1 session2 session3 session4 lock tables t write, tdel write; OK insert into t values(null,13);#blocked #blocked rename table t to tdel, tgho to t; #blocked #blocked #blocked begin; select * from tgho; #blocked #blocked OK, transaction open drop table tdel; #blocked #blocked transaction open OK #blocked #blocked transaction open unlock tables; #blocked #blocked transaction open OK #blocked #blocked transaction open #blocked #blocked commit; OK OK OK
select * from t; +----+------+ | id | n | +----+------+ | 1 | 13 | +----+------+ elect * from tdel; Empty set (0.01 sec)
Still looking into why it behaves different than yours, and correctly so.
The prefix of tables' name in your test is different with me ,that's the biggest difference.
You can change the tdel
to _t_del
,tgho
to _t_gho
and try again, you'll get a defferent result.
The reason why tables' name affect the result is metioned in the manual:
https://dev.mysql.com/doc/refman/8.0/en/metadata-locking.html
DDL statements, LOCK TABLES, and other similar statements try to reduce the number of possible deadlocks between concurrent DDL statements by acquiring locks on explicitly named tables in name order.
The DEL/GHO tables' name are in different order, as follow:
+---------------+
| Tables_in_sg2 |
+---------------+
| t |
| tdel |
| tgho |
+---------------+
3 rows in set (0.00 sec)
+---------------+
| Tables_in_sg3 |
+---------------+
| _t_del |
| _t_gho |
| t |
+---------------+
3 rows in set (0.00 sec)
In database sg2 ,the table t
comes first in table name order , and in database sg3, the table t
comes last.
The name order affect the acquiring locks order in rename
operation.
When you use t
,tdel
,tgho
, the rename
operation try acquire lock on table t
first, after session1 unlock tables, the rename session get the table lock first; but when you use t
,_t_del
,_t_gho
, when you lock table t
,_t_del
in session1, the rename
operation try acquire lock on _t_del
first(waiting for the lock of _t_del
table), and after session1 drop table _t_del
, the rename
try to acquire lock on _t_gho
table, and if there is some transaction on the _t_gho
table, the rename is waiting for _t_gho
lock, if now session1 unlock tables, the rename
still waiting for _g_gho
until the transaction on _t_gho
end, and that the problem happen: the insert
in session2 could get the lock on t
and could insert success.
Don't know if i made it clearï¼the table name order affect the test result.
---the variables:
> select @@tx_isolation;
+----------------+
| @@tx_isolation |
+----------------+
| READ-COMMITTED |
+----------------+
1 row in set (0.00 sec)
> select @@sql_mode;
+-------------------------------------------------------------------------------------------------------------------------------------------+
| @@sql_mode |
+-------------------------------------------------------------------------------------------------------------------------------------------+
| ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION |
+-------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
> select @@version;
+------------------+
| @@version |
+------------------+
| 5.7.18|
+------------------+
1 row in set (0.00 sec)
The name order affect the acquiring locks order in rename operation.
WOW. OK, let me digest this and see how to proceed.
from https://dev.mysql.com/doc/refman/5.7/en/rename-table.html:
Renaming operations are performed left to right
Metadata locks on tables are acquired in name order, which in some cases can make a difference in operation outcome when multiple transactions execute concurrently. See Section 8.11.4, “Metadata Locking”.
I'll keep looking at the implications.
from https://dev.mysql.com/doc/refman/5.7/en/rename-table.html:
Renaming operations are performed left to right
Metadata locks on tables are acquired in name order, which in some cases can make a difference in operation outcome when multiple transactions execute concurrently. See Section 8.11.4, “Metadata Locking”.
I'll keep looking at the implications.
Hi,shlomi, is there any update here ? Look forward the fix for this case :)
Sorry for the delays, I hope to give this a shot by end of month
Again, sorry for the delay. I took some time to review this issue again and can confirm it is as you describe. I appreciate your findings and analysis. OK, so there's an immediate solution for you (sorry, I could have figured it out much sooner), and then we need to consider long term solutions.
Immediate term
use e.g.: gh-ost --force-table-name=zzz_8e57cbfa9eabd7d50d0e ...
, where:
-
zzz
is to ensure table name is greater than your real table name (assuming you don't have tables calledzzz*
-
8e57cbfa9eabd7d50d0e
is some random value generated by application, e.g. viaopenssl rand -hex 10
This will ensure gh-ost
creates the following tables:
-
zzz_8e57cbfa9eabd7d50d0e_gho
-
zzz_8e57cbfa9eabd7d50d0e_ghc
-
zzz_8e57cbfa9eabd7d50d0e_del
long term
Is a bit of a problem. _
is a common prefix for "temporary" or "internal" tables. And it's a valid ascii character which does not require escaping. In the ascii table, z
is only followed by {
, '|', '}' and '~'.. None of which are valid characters for a non-escaped table.
So I'm going to bang my head a bit around fixing the problem without modifying the prefix _
. This will not be easy and I'mn not sure if I can crack it. The trick for atomic table renames was complex enough without this added constraint.
Cross fingers.
Oh, another alternative is to use -cut-over=two-step
, which leaves a gap in time when the table does not exist... It's a compromise.
Sorry, my suggested short term solution above will not work as I hoped, because right now gh-ost
always prepends a _
to the table name. This would be an easy fix if needed.
Thanks for your solutions 👍
I had discussed the short term solution with @shatianjiang , maybe there is still exists risk of lost data when there got a table named 'zzzzzXXX'(with 'zzzz' prefix...), but it would work most of time:)
So look forward the long term fix for this issue. And we're keep finding a solution also, if there is any progress ,we'll update here.
Thanks again.
The change you'd need to make for short term solution is in:
https://github.com/github/gh-ost/blob/c681c54e8f33db697f188c42e1e72c88082b3536/go/base/context.go#L280-L286
As you can see gh-ost
always prepends _
to table names. I'll likely add a simple option (e.g. --force-table-name-prefix
or something) to control that.
@zhongliangkang Even if the _a_gho
table is not queried, I have also encountered a scenario where data is lost in the rename phase. Because in the rename phase, the order acquires table locks is: _a_del, _a_gho, a
. When gh-ost acquired _a_del
table lock and try to _a_gho
table lock, although the time is very short, it is also possible that the a
table has dml SQL to be allowed to execute.
I think the following methods may be solve it:
- after drop table
_a_del
then time.sleep(100ms), finally unlock tables - For the generation of the gh-ost table of gh-ost, use ascii
~
to replace_
, such as~a_gho
replace_a_gho
@cenkore For pull code #987 , I think it is very good, but I am not sure why code not be merged?
@shlomi-noach For "I'll likely add a simple option (e.g. --force-table-name-prefix or something) to control that" having any update?
Yes. Even if no query on the _gho table, data may lost, add a query make it easy to reproduce.
We fix it by add 'zz' prefix temporary..
#987 may need more deep thinking..