gravity icon indicating copy to clipboard operation
gravity copied to clipboard

MySQL--> TiDB add special clolumn to record when the data in TiDB write to disk?

Open vkingnew opened this issue 5 years ago • 5 comments

Because MySQL-->TiDB is asynchronous,the Gravity may down or the other reasons lead to lag,in some cases i want to know when the data write to TiDB,eg,query the incremental data. So I need to add a special column for every table to record the time in target database. at current version,if i restart gravity it can't synchrone the data. I advice to add a special column for every table when synchronous data,eg _loadtime(alter table t add column _intime datetime(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6) ). Then the table in target database will automatic add a column for example _loadtime to record the time when the data to target database ,and if restart Gravity it can replicate normal. The better way to deal with this question ,I think set a special configure options,when need you can turn it on,or you can turn it off,default set it OFF.

vkingnew avatar Mar 22 '19 09:03 vkingnew

what does it mean by "if i restart gravity it can't synchrone the data"? we save the binlog gtid in source db, and continues on restart, like the way mysql slave does.

Ryan-Git avatar Mar 23 '19 03:03 Ryan-Git

ok I got it. you start the task, add the time column in target db, and the task continues with new column having default value filled by mysql(gravity don't know target schema has changed). but when it restarts, it can't work cause it thinks the target db has different schema than source.

this can be solved in two ways.

  1. If the target table has more column than source, the generated sql should place a DEFAULT keyword.

  2. add a filter called add-column, giving column name, value and value position(in argument or statement) as parameter.

I think the first way is more general, while the second is also useful but in some other cases.

Ryan-Git avatar Mar 23 '19 03:03 Ryan-Git

what does it mean by "if i restart gravity it can't synchrone the data"? we save the binlog gtid in source db, and continues on restart, like the way mysql slave does.

Because restart gravity,it validate the source schema and target schema,if the columns is not equal then error,it stop replicate data from source database.

vkingnew avatar Mar 23 '19 11:03 vkingnew

ok I got it. you start the task, add the time column in target db, and the task continues with new column having default value filled by mysql(gravity don't know target schema has changed). but when it restarts, it can't work cause it thinks the target db has different schema than source.

this can be solved in two ways.

  1. If the target table has more column than source, the generated sql should place a DEFAULT keyword.
  2. add a filter called add-column, giving column name, value and value position(in argument or statement) as parameter.

I think the first way is more general, while the second is also useful but in some other cases.

Yes,the first way is more general.we implements it by this way.IN input part,we parse the DDL statement,if CREATE TABLE we automatic add column _LOADTIME DATETIME(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6). IF restart gravity we don't validate this column.I think this's not general for some cases,for example someone just want to keep source database and target database have equal columns,they don't care the time,so add a configure option to control whether record the time when the data write to disk.

vkingnew avatar Mar 23 '19 12:03 vkingnew

the recorded time seems only useful in make statistics of gravity itself. While it's only a few microseconds later than source, it's almost meaningless. What do you do with it?

Ryan-Git avatar Mar 24 '19 06:03 Ryan-Git