tiflow icon indicating copy to clipboard operation
tiflow copied to clipboard

cdc sink reports "Write Conflict" error and causes checkpointTs lags back

Open maxshuang opened this issue 2 years ago • 4 comments

What did you do?

From the oncall, There has always been 3 CDC nodes, none was added or removed. The changefeed never actually catches up as one of the CDC nodes is always stuck. It is pausing/resuming the feed that causes the stuck CDC node to change and have this flapping behaviour.

BTW, we find unexpected 'write conflict' in one cdc node log.

log:

[2022/09/23 07:28:36.822 +00:00] [WARN] [mysql.go:612] ["execute DMLs with error, retry later"] [error="
[CDC:ErrMySQLTxnError]MySQL txn error: Error 9007: Write conflict, txnStartTS=436186150766379010, conflictStartTS=436186150727057454, conflictCommitTS=436186150766379021, 
key={tableID=172, indexID=4, indexValues={1, 85709, 15, 13, }} primary={tableID=161, indexID=1, indexValues={346943, 1850395608943165440, 729430118, 1, 1598468504, }} [try again later]"] 

What did you expect to see?

No response

What did you see instead?

--

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

6.1.0

Upstream TiKV version (execute tikv-server --version):

6.1.0

TiCDC version (execute cdc version):

v6.1.0

maxshuang avatar Sep 26 '22 11:09 maxshuang

key={tableID=172, indexID=4, indexValues={1, 85709, 15, 13, }} primary={tableID=161, indexID=1, indexValues={346943, 1850395608943165440, 729430118, 1, 1598468504, }}

since key.tableID is different from primary.tableID, which means a multi-table transaction , some writing-operations by the user in the downstream are expected .

nongfushanquan avatar Sep 29 '22 09:09 nongfushanquan

key={tableID=172, indexID=4, indexValues={1, 85709, 15, 13, }} primary={tableID=161, indexID=1, indexValues={346943, 1850395608943165440, 729430118, 1, 1598468504, }}

since key.tableID is different from primary.tableID, which means a multi-table transaction , some writing-operations by the user in the downstream are expected .

Are you saying that there are write operations from applications going on on the target cluster? Or is this multiple operations from TiCDC that are conflicting among itself? Would it be possible to print/find the connection ID's from the conflicting transactions?

dveeden avatar Sep 29 '22 13:09 dveeden

Would it be possible to:

  • show table names (and index/object?)
  • show a human readable logical/physical part of the timestamps?
  • have a more detailed message if key.tableID and primary.tableID are not the same?
  • have information about the involved threads?

dveeden avatar Sep 29 '22 13:09 dveeden

key={tableID=172, indexID=4, indexValues={1, 85709, 15, 13, }} primary={tableID=161, indexID=1, indexValues={346943, 1850395608943165440, 729430118, 1, 1598468504, }} since key.tableID is different from primary.tableID, which means a multi-table transaction , some writing-operations by the user in the downstream are expected .

Are you saying that there are write operations from applications going on on the target cluster? Or is this multiple operations from TiCDC that are conflicting among itself? Would it be possible to print/find the connection ID's from the conflicting transactions?

Yes. The transaction created by the ticdc will change the rows only in one table.

nongfushanquan avatar Sep 30 '22 01:09 nongfushanquan

/close

nongfushanquan avatar Nov 03 '22 01:11 nongfushanquan

@nongfushanquan: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ti-chi-bot avatar Nov 03 '22 01:11 ti-chi-bot