tiflow
tiflow copied to clipboard
resolved ts stucks for 2 hours when upgrade cluster from v7.1.4 to master
What did you do?
- Upgrade a TiDB cluster from v7.1.4 to v8.0, cluster with titan on. cluster size: 84TB, 27 TiKV, 3 CDC node. 2 changefeed running replicating 4000 tables, workload is around 40 - 50MB/s.
What did you expect to see?
- CDC lag should be normal during rolling upgrade.
What did you see instead?
CDC resolved ts and checkpoint ts lag keeps increasing after cluster upgrade, and up to 2 hours. Resolved ts lag back to normal about 20m after cluster upgrade is finished.
Versions of the cluster
cdc after upgrade: elease Version: v8.0.1 Git Commit Hash: 5041a915b6b31012eea939147771b136bdcd9ce2 Git Branch: heads/refs/tags/v8.0.1 UTC Build Time: 2024-03-27 13:08:27 Go Version: go version go1.21.6 linux/amd64 Failpoint Build: false
cdc before upgrade: Release Version: v7.1.4 Git Commit Hash: c52abcca9c5405bf9b76f7fa01a755862a9932d4 Git Branch: heads/refs/tags/v7.1.4 UTC Build Time: 2024-03-28 11:28:58 Go Version: go version go1.20.12 linux/amd64 Failpoint Build: false
/severity major
I adjust the severity of this issue to moderate as it is not reproducible. The issue arises from the slow speed of the incremental scan in TiKV. We are continually working to improve the incremental scan speed.
Close since not reproducible