tiflow icon indicating copy to clipboard operation
tiflow copied to clipboard

[Kafka Consumer Issue] Data inconsistency seen after CDC scale

Open fubinzh opened this issue 1 year ago • 4 comments

What did you do?

  1. deploy TiDB cluster with 1 cdc node, create kafka changefeed, and run kafka consumer to consumer kafka message a MySQL.
cdc  cli  changefeed  create "--server=127.0.0.1:8301" "--sink-uri=kafka://downstream-kafka.cdc-testbed-tps-2280524-1-333:9092/cdc-event-open-protocol-cdc-scale?max-message-bytes=1048576&protocol=open-protocol&replication-factor=3" "--changefeed-id=cdc-scale-open-protocol-changefeed
  1. Run sysbench prepare
sysbench --db-driver=mysql --mysql-host=`nslookup upstream-tidb.cdc-testbed-tps-2280524-1-333 | awk -F: '{print $2}' | awk 'NR==5' | sed s/[[:space:]]//g`  --mysql-port=4000 --mysql-user=root --mysql-db=workload --tables=32 --table-size=100000 --create_secondary=off --debug=true --threads=32 --mysql-ignore-errors=2013,1213,1105,1205,8022,8027,8028,9004,9007,1062 oltp_write_only prepare
  1. run sysbench workload and at the same scale cdc from 2 to 6 nodes.
sysbench --db-driver=mysql --mysql-host=`nslookup upstream-tidb.cdc-testbed-tps-2280524-1-333 | awk -F: '{print $2}' | awk 'NR==5' | sed s/[[:space:]]//g`  --mysql-port=4000 --mysql-user=root --mysql-db=workload --tables=32 --table-size=100000 --create_secondary=off --time=1200 --debug=true --threads=32 --mysql-ignore-errors=2013,1213,1105,1205,8022,8027,8028,9004,9007,1062 oltp_write_only run
  1. send finishmark, and do data consistency check when MySQL receives finishmark.

What did you expect to see?

Data should be consisntent.

What did you see instead?

Data inconsistency seen.

image

Versions of the cluster

cdc version: [root@upstream-ticdc-0 /]# /cdc version Release Version: v7.4.0-alpha Git Commit Hash: 254cc2b18dab97be67c01e67ea92af3defa36c40 Git Branch: heads/refs/tags/v7.4.0-alpha UTC Build Time: 2023-09-06 11:36:11 Go Version: go version go1.21.0 linux/amd64 Failpoint Build: false

fubinzh avatar Sep 07 '23 05:09 fubinzh

/found automation

fubinzh avatar Sep 07 '23 05:09 fubinzh

/severity major

fubinzh avatar Sep 08 '23 00:09 fubinzh

Base on the sync-diff summary image

I founded below messages from the kafka topic , it shows ticdc already sent the update event to Kafka

{
    "u": {
        "c": {
            "t": 254,
            "f": 0,
            "v": "55368250724-96461947335-24187764707-65260444679-46692396102-21811308953-36638923458-18656561470-57423451092-43285125722"
        },
        "id": {
            "t": 3,
            "h": true,
            "f": 11,
            "v": 67533
        },
        "k": {
            "t": 3,
            "f": 1,
            "v": 52118
        },
        "pad": {
            "t": 254,
            "f": 0,
            "v": "71861985700-07222871824-88378454986-92661605151-75207053250"
        }
    },
    "p": {
        "c": {
            "t": 254,
            "f": 0,
            "v": "53348281694-21978480135-81348173179-73925401350-41399101720-17376868646-87723030020-19163581079-21416984997-48227990150"
        },
        "id": {
            "t": 3,
            "h": true,
            "f": 11,
            "v": 67533
        },
        "k": {
            "t": 3,
            "f": 1,
            "v": 49961
        },
        "pad": {
            "t": 254,
            "f": 0,
            "v": "34753459357-89875488434-09948998701-49662349889-89845145068"
        }
    }
}

So it's not a ticdc side issue. we need to fix the kafka-consumer, seems it missed an update event.

sdojjy avatar Sep 10 '23 02:09 sdojjy

/remove-severity major /severity moderate

fubinzh avatar Sep 10 '23 02:09 fubinzh