Support converting unsupported collation to TiDB supported collation or skip the schema creation
Feature Request
Is your feature request related to a problem? Please describe:
TiDB doesn't support _unicode_ci, however, this is a common collation that is widely used. Currently, DM can't handle the schemas with _unicode_ci as the schema creation will fail. The schema creation can't be bypassed, too.
Describe the feature you'd like:
- DM should convert other unsupported collations to the supported collations (
_binand_general_ciif new_collations_enabled_on_first_bootstrap is enabled); - If the auto conversion approach is not ideal, then at least allow the user to create schemas manually and then config the DM to skip the schema restore.
Teachability, Documentation, Adoption, Migration Strategy:
The auto-conversion or skip schema creation should be configured in the task configuration.
P.S. I would like to contribute this feature if it's accepted, Thanks!
Sounds great. currently DM will stop at that error so user can use handle-error to replace that SQL, but an auto-conversion feature is definitely better.
Could you give an exmaple about how this configuration looks like in task configuration?
Sounds great. currently DM will stop at that error so user can use
handle-errorto replace that SQL, but an auto-conversion feature is definitely better.
Hello @lance6716, from what I saw in the code and what I observed from my tests, handle-error can't be used during the dump and load, which means the error can't be handled when task-mode: all:
https://github.com/pingcap/dm/blob/1e9a0948a0a5bf8772b50cc58dd1ab8caebc90d8/dm/worker/subtask.go#L679-L683
Could you give an exmaple about how this configuration looks like in task configuration?
I think the configuration could be under Global setting section:
# ----------- Global setting -----------
## ********* Basic configuration *********
name: test # The name of the task. Should be globally unique.
task-mode: all # The task mode. Can be set to `full`/`incremental`/`all`.
# Other configurations here...
force-utf8-collation: general_ci # The collation to force convert to, only applicable to `utf8`/`utf8mb4` character sets. Available collations are `bin`/`general_ci`(only if `new_collations_enabled_on_first_bootstrap` is enabled)
skip-schema-restoration: true # Weather to skip the schema restoration. It only takes effect in `all` task mode
I feel it's good to have the option to allow users to skip the schema restoration, which enables users to have full control of the schema creation if needed. WDYT?
If the auto conversion approach is not ideal, then at least allow the user to create schemas manually and then config the DM to skip the schema restore.
You can manually create schemas in TiDB now, DM will skip the CREATE TABLE statement if table already exist in full mode.
Convert other unsupported collations also sounds great.
Hello @GMHDBJD, from what I saw it doesn't actually skip it, here are some related logs:
[2020/12/22 03:16:01.021 +00:00] [ERROR] [baseconn.go:178] ["execute statement failed"] [task=dm-replica-task] [unit=load] [query="/*details omitted*/"] [argument="[]"] [error="Error 1050: Table 'xxx' already exists"]
[2020/12/22 03:16:01.021 +00:00] [ERROR] [db.go:176] ["execute statements failed after retry"] [task=dm-replica-task] [unit=load] [queries=""] [arguments="[]"] [error="[code=10006:class=database:scope=not-set:level=high] execute statement failed: Error 1050: Table 'xxx' already exists"]
[2020/12/22 03:16:01.021 +00:00] [INFO] [loader.go:986] ["table already exists, skip it"] [task=dm-replica-task] [unit=load] ["table schema file"=./dumped_data.dm-replica-task/xxx-schema.sql]
It still executes the statements even though the table exists. The error is skippable if the collation used in the table is supported. However, if the collation is not supported, the execution will fail with another error Error 1273: Unsupported collation when new collation is enabled: 'utf8mb4_unicode_ci', which is not skippable since it's a different error.
It seems the CREATE TABLE IF NOT EXISTS doesn't check the table existence first, therefore the collation unsupported error will be thrown instead of the table exists error.
If DM can check the table existence at first and skip the creation when it exists, then I believe the skip-schema-restoration is not needed.
Hello @GMHDBJD @lance6716, any other concern regarding this issue? Do we have a consensus here about what should be implemented?
It still executes the statements even though the table exists. The error is skippable if the collation used in the table is supported. However, if the collation is not supported, the execution will fail with another error Error 1273: Unsupported collation when new collation is enabled: 'utf8mb4_unicode_ci', which is not skippable since it's a different error.
sorry for the inconvenience. This is may because we didn't rewrite CREATE TABLE statement to CREATE TABLE IF NOT EXISTS (we use a filter to skip "table already exists" errors but didn't work well). Could you help check that such a rewriting could provide a workaround in dump & load phase? and welcome to help us implement it or let TiDB report "table already exists" not "Unsupported collation"
we're planning some character set related feature in this sprint, still not decide yet. We'll request your kindly help to implement this collation feature some days later 😄
@lance6716 Actually the CREATE TABLE IF NOT EXISTS won't solve the problem. It seems MySQL/TiDB will vet the schema statements first and stop when it finds out the collation is not supported, therefore Error 1273: Unsupported collation will be thrown even when using CREATE TABLE IF NOT EXISTS.
I've bypassed this issue with lots of hacks anyway 😂 . Lemme know when you decide how to tackle this and I'm ready to help 😃
Hi, @namco1992 . The following advice may help you.
- TiDB will support collation
utf8mb4_unicode_ciandutf8_unicode_ciin 5.0-rc. #17596 - User can modify the file
dbname.tbname-schema.sqlin worker's dump dir, change the collation for create table statement and then resume the task. - Dumpling support
no-schemaflag, which will dump table data except table schema. We can support it in loader too, then we only need to create table manually in TiDB. - Handle
Unsupported collationerror specially. If table already exist, ignore the error. - Auto convert the collation.
I prefer the 4th solution, because it is simple, effective and less configuration. What's your opinion? Any contribution will be appreciated.
Hello @GMHDBJD, I think 4th solution is good for me, to summarize:
- Execute the
CREATE TABLEstatement; - If error && error == table already exists then ignore the error;
- If error && error == unspported collation then check the table existence. If table exists, ignore the error;
If we all agree on this flow then I can start to work on it. Thanks!
@namco1992 LGTM, and for incremental phase, function handleSpecialDDLError could be used to detect unsupported collation error and rewrite SQL
Hi, do you need some help to implement this @namco1992
Hello @lance6716, sorry for dragging too long about this issue. I was busy with work stuff. However, I noticed that TiDB supported unicode_ci starts 4.0.11, so it seems not that urgent to have this feature anymore.
But I will still work on it as there are still many other collations that can cause this error. I will try to create a PR by this week if it's not too late, thanks!
Take your time, and if you need some help you could comment here or join sig-migrate in slack