pipeline migration: why use sql update（only include changed column） in increment task

Open dongyun-coder opened this issue 7 months ago • 0 comments

Feature Request

In the incremental processing stage, the system extracts the latest changes from row-based binlog events, identifies the modified columns, and generates the corresponding UPDATE statement.

A more elegant approach would be to convert it into an INSERT ... ON DUPLICATE KEY UPDATE statement.（override all column）

or at lease update all columns。

Here is a problem?

Data Synchronization Issue with Timestamp Columns

The source table contains a column gmt_modified (datetime type) that automatically updates:

gmt_modified datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP

Current Behavior: When the source table updates a row where the new gmt_modified value remains identical to the previous value (within second precision), the pipeline's sink operation:

Generates an UPDATE SQL statement that excludes the gmt_modified column

So gmt_modified will be updated by current time not from source table.

Result: The target table's gmt_modified value does not match the source table's value.

May 27 '25 04:05 dongyun-coder