tiflow icon indicating copy to clipboard operation
tiflow copied to clipboard

resolved ts stucks for 2 hours when upgrade cluster from v7.1.4 to master

Open fubinzh opened this issue 10 months ago • 2 comments

What did you do?

  1. Upgrade a TiDB cluster from v7.1.4 to v8.0, cluster with titan on. cluster size: 84TB, 27 TiKV, 3 CDC node. 2 changefeed running replicating 4000 tables, workload is around 40 - 50MB/s.

What did you expect to see?

  1. CDC lag should be normal during rolling upgrade.

What did you see instead?

CDC resolved ts and checkpoint ts lag keeps increasing after cluster upgrade, and up to 2 hours. Resolved ts lag back to normal about 20m after cluster upgrade is finished. image image image

Versions of the cluster

cdc after upgrade: elease Version: v8.0.1 Git Commit Hash: 5041a915b6b31012eea939147771b136bdcd9ce2 Git Branch: heads/refs/tags/v8.0.1 UTC Build Time: 2024-03-27 13:08:27 Go Version: go version go1.21.6 linux/amd64 Failpoint Build: false

cdc before upgrade: Release Version: v7.1.4 Git Commit Hash: c52abcca9c5405bf9b76f7fa01a755862a9932d4 Git Branch: heads/refs/tags/v7.1.4 UTC Build Time: 2024-03-28 11:28:58 Go Version: go version go1.20.12 linux/amd64 Failpoint Build: false

fubinzh avatar Apr 01 '24 11:04 fubinzh

/severity major

fubinzh avatar Apr 01 '24 11:04 fubinzh

I adjust the severity of this issue to moderate as it is not reproducible. The issue arises from the slow speed of the incremental scan in TiKV. We are continually working to improve the incremental scan speed.

asddongmen avatar Apr 30 '24 03:04 asddongmen

Close since not reproducible

flowbehappy avatar May 07 '24 10:05 flowbehappy