tiflow icon indicating copy to clipboard operation
tiflow copied to clipboard

target-column Assignment Fails/Reverses for extract-table/extract-schema in Route Rules

Open luyomo opened this issue 4 weeks ago • 4 comments

What did you do?

MySQL:

create table mig_table(
  id int primary key,
  col01 int
)

TiDB

create table mig_schema.mig_table(
  id int primary key,
  col01 int,
  c_schema varchar(64),
  c_table varchar(64),
  unique key(id, c_schema, c_table)
);

DM task config

routes:
  rr-4:
    schema-pattern: "..."
    table-pattern: "..."
    target-schema: "..."
    target-table: "..."
    extract-schema:
      schema-regexp: "(.*)"
      target-column: "c_schema"
    extract-table:
      table-regexp: "(.*)"
      target-column: "c_table"

Data replication from MySQL to TiDB

Data insert into MySQL

MySQL$ insert into mig_schema.mig_table values(1, 1);

Data replicated to TiDB

TiDB$ select * from mig_table;
id: 1
col01: 1, 
c_schema: mig_table
c_table: mig_schema

What did you expect to see?

Data replicated to TiDB

TiDB$ select * from mig_table;
id: 1
col01: 1, 
c_schema: mig_schema
c_table:mig_table

What did you see instead?

Data replicated to TiDB

TiDB$ select * from mig_table;
id: 1
col01: 1, 
c_schema: mig_table
c_table: mig_schema

Versions of the cluster

DM version (run dmctl -V or dm-worker -V or dm-master -V):

$ tiup dmctl -V 
Checking updates for component dmctl... 
A new version of dmctl is available: v8.5.3 -> v8.5.4

    To update this component:   tiup update dmctl
    To update all components:   tiup update --all

Starting component dmctl: /home/azureuser/.tiup/components/dmctl/v8.5.3/dmctl/dmctl -V
Release Version: v8.5.4-aeon-dev
Git Commit Hash: bae2dc99691b5c20e69e2b02f9175f63a2b60202
Git Branch: heads/refs/tags/v8.5.4-aeon-dev
UTC Build Time: 2025-11-25 10:16:10
Go Version: go1.23.8
Failpoint Build: false

Upstream MySQL/MariaDB server version:

Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 2193
Server version: 8.0.42-azure Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

Downstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version()                                                                                                                                                                                                                                                                                                            |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v8.1.2
Edition: Enterprise
Git Commit Hash: 80db67052c0b1d89ab5ea94d6efa576a2191432d
Git Branch: HEAD
UTC Build Time: 2024-12-24 06:38:13
GoVersion: go1.21.13
Race Enabled: false
Check Table Before Drop: false
Store: tikv
Enterprise Extension Commit Hash: cb58c31ad9364cc770b250a4702c8d7b75f98b1a |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.001 sec)

How did you deploy DM: tiup or manually?

TiUP

Other interesting information (system version, hardware config, etc):

Based on the current codebase, we have identified the sequence of operations and a bug in the data insertion logic:

The sequence in the schema extractor logic is defined as: TableExtractor, SchemaExtractor, and then SourceExtractor (see router.go line 290).

In the DM synchronization code (syncer.go line 2776), the generateExtendColumn function does not utilize the column mapping defined by target-column. Instead, it retrieves the extracted data based strictly on the fixed array sequence (Table, Schema, Source) and pushes these values into the row for insertion into TiDB.

Conclusion: Due to this reliance on the fixed sequence rather than the configuration mapping, the target-column setting has no effect on replicating the data to the target table, resulting in data reversal or misassignment.

https://github.com/pingcap/tidb/blob/v8.5.3/pkg/util/table-router/router.go#L290
https://github.com/pingcap/tiflow/blob/master/dm/syncer/syncer.go#L2776

current status of DM cluster (execute query-status <task-name> in dmctl)

(paste current status of DM cluster here)

luyomo avatar Dec 02 '25 12:12 luyomo

PTAL @GMHDBJD @D3Hunter

lance6716 avatar Dec 03 '25 04:12 lance6716

@OliverS929 ptal

D3Hunter avatar Dec 03 '25 16:12 D3Hunter

Based on the code in TiDB (https://github.com/pingcap/tidb/blob/master/pkg/util/regexpr-router/regexpr_router.go#L195-L209), the routing logic currently applies a fixed evaluation order — table → schema → source. It does not attempt to align or reorder fields based on the downstream schema. This appears to be the likely reason for the unexpected behavior. As a workaround, the downstream table should currently follow this fixed order.

@alastori This might be worth calling out explicitly in the DM documentation, since the implementation assumes this matching order, but it isn’t clear to users from the configuration alone.

OliverS929 avatar Dec 05 '25 08:12 OliverS929

Oh, the document needs to be improved to specify the order of 3 columns

https://docs.pingcap.com/zh/tidb/stable/dm-table-routing/#%E9%85%8D%E7%BD%AE%E8%A1%A8%E8%B7%AF%E7%94%B1

合表需要包含用于存放分表源数据信息的三个扩展列 target-column (表名列、库名列、数据源列),扩展列必须为表末尾列且必须为字符串类型

https://docs.pingcap.com/tidb/stable/dm-table-routing/#extract-table-schema-and-source-information-and-write-into-the-merged-table

The merged table must contain the three target-columns (c_table, c_schema, and c_source) used for specifying the source information. In addition, these columns must be the last columns and be string types.

lance6716 avatar Dec 05 '25 08:12 lance6716