tiflow cdc sink reports "Write Conflict" error and causes checkpointTs lags back

What did you do?

From the oncall, There has always been 3 CDC nodes, none was added or removed. The changefeed never actually catches up as one of the CDC nodes is always stuck. It is pausing/resuming the feed that causes the stuck CDC node to change and have this flapping behaviour.

BTW, we find unexpected 'write conflict' in one cdc node log.

log:

[2022/09/23 07:28:36.822 +00:00] [WARN] [mysql.go:612] ["execute DMLs with error, retry later"] [error="
[CDC:ErrMySQLTxnError]MySQL txn error: Error 9007: Write conflict, txnStartTS=436186150766379010, conflictStartTS=436186150727057454, conflictCommitTS=436186150766379021, 
key={tableID=172, indexID=4, indexValues={1, 85709, 15, 13, }} primary={tableID=161, indexID=1, indexValues={346943, 1850395608943165440, 729430118, 1, 1598468504, }} [try again later]"]

What did you expect to see?

No response

What did you see instead?

--

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

6.1.0

Upstream TiKV version (execute tikv-server --version):

6.1.0

TiCDC version (execute cdc version):

v6.1.0

Sep 26 '22 11:09 maxshuang

key={tableID=172, indexID=4, indexValues={1, 85709, 15, 13, }} primary={tableID=161, indexID=1, indexValues={346943, 1850395608943165440, 729430118, 1, 1598468504, }}

since key.tableID is different from primary.tableID, which means a multi-table transaction , some writing-operations by the user in the downstream are expected .

Sep 29 '22 09:09 nongfushanquan

key={tableID=172, indexID=4, indexValues={1, 85709, 15, 13, }} primary={tableID=161, indexID=1, indexValues={346943, 1850395608943165440, 729430118, 1, 1598468504, }}

since key.tableID is different from primary.tableID, which means a multi-table transaction , some writing-operations by the user in the downstream are expected .

Are you saying that there are write operations from applications going on on the target cluster? Or is this multiple operations from TiCDC that are conflicting among itself? Would it be possible to print/find the connection ID's from the conflicting transactions?

Sep 29 '22 13:09 dveeden

Would it be possible to:

show table names (and index/object?)
show a human readable logical/physical part of the timestamps?
have a more detailed message if key.tableID and primary.tableID are not the same?
have information about the involved threads?

Sep 29 '22 13:09 dveeden

key={tableID=172, indexID=4, indexValues={1, 85709, 15, 13, }} primary={tableID=161, indexID=1, indexValues={346943, 1850395608943165440, 729430118, 1, 1598468504, }} since key.tableID is different from primary.tableID, which means a multi-table transaction , some writing-operations by the user in the downstream are expected .

Are you saying that there are write operations from applications going on on the target cluster? Or is this multiple operations from TiCDC that are conflicting among itself? Would it be possible to print/find the connection ID's from the conflicting transactions?

Yes. The transaction created by the ticdc will change the rows only in one table.

Sep 30 '22 01:09 nongfushanquan

/close

Nov 03 '22 01:11 nongfushanquan

@nongfushanquan: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Nov 03 '22 01:11 ti-chi-bot

tiflow tiflow copied to clipboard

cdc sink reports "Write Conflict" error and causes checkpointTs lags back

What did you do?

What did you expect to see?

What did you see instead?

Versions of the cluster

tiflow
tiflow copied to clipboard