dm icon indicating copy to clipboard operation
dm copied to clipboard

Support converting unsupported collation to TiDB supported collation or skip the schema creation

Open namco1992 opened this issue 5 years ago • 13 comments

Feature Request

Is your feature request related to a problem? Please describe:

TiDB doesn't support _unicode_ci, however, this is a common collation that is widely used. Currently, DM can't handle the schemas with _unicode_ci as the schema creation will fail. The schema creation can't be bypassed, too.

Describe the feature you'd like:

  1. DM should convert other unsupported collations to the supported collations (_bin and _general_ci if new_collations_enabled_on_first_bootstrap is enabled);
  2. If the auto conversion approach is not ideal, then at least allow the user to create schemas manually and then config the DM to skip the schema restore.

Teachability, Documentation, Adoption, Migration Strategy:

The auto-conversion or skip schema creation should be configured in the task configuration.

P.S. I would like to contribute this feature if it's accepted, Thanks!

namco1992 avatar Dec 22 '20 11:12 namco1992

Sounds great. currently DM will stop at that error so user can use handle-error to replace that SQL, but an auto-conversion feature is definitely better.

Could you give an exmaple about how this configuration looks like in task configuration?

lance6716 avatar Dec 22 '20 11:12 lance6716

Sounds great. currently DM will stop at that error so user can use handle-error to replace that SQL, but an auto-conversion feature is definitely better.

Hello @lance6716, from what I saw in the code and what I observed from my tests, handle-error can't be used during the dump and load, which means the error can't be handled when task-mode: all: https://github.com/pingcap/dm/blob/1e9a0948a0a5bf8772b50cc58dd1ab8caebc90d8/dm/worker/subtask.go#L679-L683

Could you give an exmaple about how this configuration looks like in task configuration?

I think the configuration could be under Global setting section:

# ----------- Global setting -----------
## ********* Basic configuration *********
name: test                       # The name of the task. Should be globally unique.
task-mode: all                   # The task mode. Can be set to `full`/`incremental`/`all`.

# Other configurations here...

force-utf8-collation: general_ci # The collation to force convert to, only applicable to `utf8`/`utf8mb4` character sets. Available collations are `bin`/`general_ci`(only if `new_collations_enabled_on_first_bootstrap` is enabled)
skip-schema-restoration: true    # Weather to skip the schema restoration. It only takes effect in `all` task mode

I feel it's good to have the option to allow users to skip the schema restoration, which enables users to have full control of the schema creation if needed. WDYT?

namco1992 avatar Dec 23 '20 03:12 namco1992

If the auto conversion approach is not ideal, then at least allow the user to create schemas manually and then config the DM to skip the schema restore.

You can manually create schemas in TiDB now, DM will skip the CREATE TABLE statement if table already exist in full mode. Convert other unsupported collations also sounds great.

GMHDBJD avatar Dec 23 '20 03:12 GMHDBJD

Hello @GMHDBJD, from what I saw it doesn't actually skip it, here are some related logs:

[2020/12/22 03:16:01.021 +00:00] [ERROR] [baseconn.go:178] ["execute statement failed"] [task=dm-replica-task] [unit=load] [query="/*details omitted*/"] [argument="[]"] [error="Error 1050: Table 'xxx' already exists"]
[2020/12/22 03:16:01.021 +00:00] [ERROR] [db.go:176] ["execute statements failed after retry"] [task=dm-replica-task] [unit=load] [queries=""] [arguments="[]"] [error="[code=10006:class=database:scope=not-set:level=high] execute statement failed: Error 1050: Table 'xxx' already exists"]
[2020/12/22 03:16:01.021 +00:00] [INFO] [loader.go:986] ["table already exists, skip it"] [task=dm-replica-task] [unit=load] ["table schema file"=./dumped_data.dm-replica-task/xxx-schema.sql]

It still executes the statements even though the table exists. The error is skippable if the collation used in the table is supported. However, if the collation is not supported, the execution will fail with another error Error 1273: Unsupported collation when new collation is enabled: 'utf8mb4_unicode_ci', which is not skippable since it's a different error.

It seems the CREATE TABLE IF NOT EXISTS doesn't check the table existence first, therefore the collation unsupported error will be thrown instead of the table exists error.

If DM can check the table existence at first and skip the creation when it exists, then I believe the skip-schema-restoration is not needed.

namco1992 avatar Dec 23 '20 03:12 namco1992

Hello @GMHDBJD @lance6716, any other concern regarding this issue? Do we have a consensus here about what should be implemented?

namco1992 avatar Dec 28 '20 08:12 namco1992

It still executes the statements even though the table exists. The error is skippable if the collation used in the table is supported. However, if the collation is not supported, the execution will fail with another error Error 1273: Unsupported collation when new collation is enabled: 'utf8mb4_unicode_ci', which is not skippable since it's a different error.

sorry for the inconvenience. This is may because we didn't rewrite CREATE TABLE statement to CREATE TABLE IF NOT EXISTS (we use a filter to skip "table already exists" errors but didn't work well). Could you help check that such a rewriting could provide a workaround in dump & load phase? and welcome to help us implement it or let TiDB report "table already exists" not "Unsupported collation"

we're planning some character set related feature in this sprint, still not decide yet. We'll request your kindly help to implement this collation feature some days later 😄

lance6716 avatar Dec 28 '20 09:12 lance6716

@lance6716 Actually the CREATE TABLE IF NOT EXISTS won't solve the problem. It seems MySQL/TiDB will vet the schema statements first and stop when it finds out the collation is not supported, therefore Error 1273: Unsupported collation will be thrown even when using CREATE TABLE IF NOT EXISTS.

I've bypassed this issue with lots of hacks anyway 😂 . Lemme know when you decide how to tackle this and I'm ready to help 😃

namco1992 avatar Dec 28 '20 10:12 namco1992

Hi, @namco1992 . The following advice may help you.

  1. TiDB will support collation utf8mb4_unicode_ci and utf8_unicode_ci in 5.0-rc. #17596
  2. User can modify the file dbname.tbname-schema.sql in worker's dump dir, change the collation for create table statement and then resume the task.
  3. Dumpling support no-schema flag, which will dump table data except table schema. We can support it in loader too, then we only need to create table manually in TiDB.
  4. Handle Unsupported collation error specially. If table already exist, ignore the error.
  5. Auto convert the collation.

I prefer the 4th solution, because it is simple, effective and less configuration. What's your opinion? Any contribution will be appreciated.

GMHDBJD avatar Dec 30 '20 03:12 GMHDBJD

Hello @GMHDBJD, I think 4th solution is good for me, to summarize:

  1. Execute the CREATE TABLE statement;
  2. If error && error == table already exists then ignore the error;
  3. If error && error == unspported collation then check the table existence. If table exists, ignore the error;

If we all agree on this flow then I can start to work on it. Thanks!

namco1992 avatar Jan 04 '21 02:01 namco1992

@namco1992 LGTM, and for incremental phase, function handleSpecialDDLError could be used to detect unsupported collation error and rewrite SQL

lance6716 avatar Jan 04 '21 07:01 lance6716

Hi, do you need some help to implement this @namco1992

lance6716 avatar Apr 07 '21 05:04 lance6716

Hello @lance6716, sorry for dragging too long about this issue. I was busy with work stuff. However, I noticed that TiDB supported unicode_ci starts 4.0.11, so it seems not that urgent to have this feature anymore.

But I will still work on it as there are still many other collations that can cause this error. I will try to create a PR by this week if it's not too late, thanks!

namco1992 avatar Apr 07 '21 06:04 namco1992

Take your time, and if you need some help you could comment here or join sig-migrate in slack

lance6716 avatar Apr 07 '21 06:04 lance6716