target-column Assignment Fails/Reverses for extract-table/extract-schema in Route Rules
What did you do?
MySQL:
create table mig_table(
id int primary key,
col01 int
)
TiDB
create table mig_schema.mig_table(
id int primary key,
col01 int,
c_schema varchar(64),
c_table varchar(64),
unique key(id, c_schema, c_table)
);
DM task config
routes:
rr-4:
schema-pattern: "..."
table-pattern: "..."
target-schema: "..."
target-table: "..."
extract-schema:
schema-regexp: "(.*)"
target-column: "c_schema"
extract-table:
table-regexp: "(.*)"
target-column: "c_table"
Data replication from MySQL to TiDB
Data insert into MySQL
MySQL$ insert into mig_schema.mig_table values(1, 1);
Data replicated to TiDB
TiDB$ select * from mig_table;
id: 1
col01: 1,
c_schema: mig_table
c_table: mig_schema
What did you expect to see?
Data replicated to TiDB
TiDB$ select * from mig_table;
id: 1
col01: 1,
c_schema: mig_schema
c_table:mig_table
What did you see instead?
Data replicated to TiDB
TiDB$ select * from mig_table;
id: 1
col01: 1,
c_schema: mig_table
c_table: mig_schema
Versions of the cluster
DM version (run dmctl -V or dm-worker -V or dm-master -V):
$ tiup dmctl -V
Checking updates for component dmctl...
A new version of dmctl is available: v8.5.3 -> v8.5.4
To update this component: tiup update dmctl
To update all components: tiup update --all
Starting component dmctl: /home/azureuser/.tiup/components/dmctl/v8.5.3/dmctl/dmctl -V
Release Version: v8.5.4-aeon-dev
Git Commit Hash: bae2dc99691b5c20e69e2b02f9175f63a2b60202
Git Branch: heads/refs/tags/v8.5.4-aeon-dev
UTC Build Time: 2025-11-25 10:16:10
Go Version: go1.23.8
Failpoint Build: false
Upstream MySQL/MariaDB server version:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 2193
Server version: 8.0.42-azure Source distribution
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
Downstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version() |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v8.1.2
Edition: Enterprise
Git Commit Hash: 80db67052c0b1d89ab5ea94d6efa576a2191432d
Git Branch: HEAD
UTC Build Time: 2024-12-24 06:38:13
GoVersion: go1.21.13
Race Enabled: false
Check Table Before Drop: false
Store: tikv
Enterprise Extension Commit Hash: cb58c31ad9364cc770b250a4702c8d7b75f98b1a |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.001 sec)
How did you deploy DM: tiup or manually?
TiUP
Other interesting information (system version, hardware config, etc):
Based on the current codebase, we have identified the sequence of operations and a bug in the data insertion logic:
The sequence in the schema extractor logic is defined as: TableExtractor, SchemaExtractor, and then SourceExtractor (see router.go line 290).
In the DM synchronization code (syncer.go line 2776), the generateExtendColumn function does not utilize the column mapping defined by target-column. Instead, it retrieves the extracted data based strictly on the fixed array sequence (Table, Schema, Source) and pushes these values into the row for insertion into TiDB.
Conclusion: Due to this reliance on the fixed sequence rather than the configuration mapping, the target-column setting has no effect on replicating the data to the target table, resulting in data reversal or misassignment.
https://github.com/pingcap/tidb/blob/v8.5.3/pkg/util/table-router/router.go#L290
https://github.com/pingcap/tiflow/blob/master/dm/syncer/syncer.go#L2776
current status of DM cluster (execute query-status <task-name> in dmctl)
(paste current status of DM cluster here)
PTAL @GMHDBJD @D3Hunter
@OliverS929 ptal
Based on the code in TiDB (https://github.com/pingcap/tidb/blob/master/pkg/util/regexpr-router/regexpr_router.go#L195-L209), the routing logic currently applies a fixed evaluation order — table → schema → source. It does not attempt to align or reorder fields based on the downstream schema. This appears to be the likely reason for the unexpected behavior. As a workaround, the downstream table should currently follow this fixed order.
@alastori This might be worth calling out explicitly in the DM documentation, since the implementation assumes this matching order, but it isn’t clear to users from the configuration alone.
Oh, the document needs to be improved to specify the order of 3 columns
https://docs.pingcap.com/zh/tidb/stable/dm-table-routing/#%E9%85%8D%E7%BD%AE%E8%A1%A8%E8%B7%AF%E7%94%B1
合表需要包含用于存放分表源数据信息的三个扩展列 target-column (表名列、库名列、数据源列),扩展列必须为表末尾列且必须为字符串类型
https://docs.pingcap.com/tidb/stable/dm-table-routing/#extract-table-schema-and-source-information-and-write-into-the-merged-table
The merged table must contain the three target-columns (c_table, c_schema, and c_source) used for specifying the source information. In addition, these columns must be the last columns and be string types.