tiflow
tiflow copied to clipboard
[Kafka Consumer Issue] Data inconsistency seen after CDC scale
What did you do?
- deploy TiDB cluster with 1 cdc node, create kafka changefeed, and run kafka consumer to consumer kafka message a MySQL.
cdc cli changefeed create "--server=127.0.0.1:8301" "--sink-uri=kafka://downstream-kafka.cdc-testbed-tps-2280524-1-333:9092/cdc-event-open-protocol-cdc-scale?max-message-bytes=1048576&protocol=open-protocol&replication-factor=3" "--changefeed-id=cdc-scale-open-protocol-changefeed
- Run sysbench prepare
sysbench --db-driver=mysql --mysql-host=`nslookup upstream-tidb.cdc-testbed-tps-2280524-1-333 | awk -F: '{print $2}' | awk 'NR==5' | sed s/[[:space:]]//g` --mysql-port=4000 --mysql-user=root --mysql-db=workload --tables=32 --table-size=100000 --create_secondary=off --debug=true --threads=32 --mysql-ignore-errors=2013,1213,1105,1205,8022,8027,8028,9004,9007,1062 oltp_write_only prepare
- run sysbench workload and at the same scale cdc from 2 to 6 nodes.
sysbench --db-driver=mysql --mysql-host=`nslookup upstream-tidb.cdc-testbed-tps-2280524-1-333 | awk -F: '{print $2}' | awk 'NR==5' | sed s/[[:space:]]//g` --mysql-port=4000 --mysql-user=root --mysql-db=workload --tables=32 --table-size=100000 --create_secondary=off --time=1200 --debug=true --threads=32 --mysql-ignore-errors=2013,1213,1105,1205,8022,8027,8028,9004,9007,1062 oltp_write_only run
- send finishmark, and do data consistency check when MySQL receives finishmark.
What did you expect to see?
Data should be consisntent.
What did you see instead?
Data inconsistency seen.
Versions of the cluster
cdc version: [root@upstream-ticdc-0 /]# /cdc version Release Version: v7.4.0-alpha Git Commit Hash: 254cc2b18dab97be67c01e67ea92af3defa36c40 Git Branch: heads/refs/tags/v7.4.0-alpha UTC Build Time: 2023-09-06 11:36:11 Go Version: go version go1.21.0 linux/amd64 Failpoint Build: false
/found automation
/severity major
Base on the sync-diff summary
I founded below messages from the kafka topic , it shows ticdc already sent the update event to Kafka
{
"u": {
"c": {
"t": 254,
"f": 0,
"v": "55368250724-96461947335-24187764707-65260444679-46692396102-21811308953-36638923458-18656561470-57423451092-43285125722"
},
"id": {
"t": 3,
"h": true,
"f": 11,
"v": 67533
},
"k": {
"t": 3,
"f": 1,
"v": 52118
},
"pad": {
"t": 254,
"f": 0,
"v": "71861985700-07222871824-88378454986-92661605151-75207053250"
}
},
"p": {
"c": {
"t": 254,
"f": 0,
"v": "53348281694-21978480135-81348173179-73925401350-41399101720-17376868646-87723030020-19163581079-21416984997-48227990150"
},
"id": {
"t": 3,
"h": true,
"f": 11,
"v": 67533
},
"k": {
"t": 3,
"f": 1,
"v": 49961
},
"pad": {
"t": 254,
"f": 0,
"v": "34753459357-89875488434-09948998701-49662349889-89845145068"
}
}
}
So it's not a ticdc side issue. we need to fix the kafka-consumer, seems it missed an update event.
/remove-severity major /severity moderate