tiflow
tiflow copied to clipboard
The resolvedTs of a changefeed get stuck during initialization if there are more than 400k resolvedTs per second sending from upstream.
What did you do?
- TiDB cluster with 8.1 TiKV and 7.5 CDC
- create kafka simple protocol changefeed
- no workload running
What did you expect to see?
CDC resolved ts should be normal
What did you see instead?
CDC resolved ts stucks
Versions of the cluster
TiKV version Release Version: 8.1.0 Git Commit Hash: 56613f7c3e28c02853cc51d15bc1b77f68b58be8
CDC version: Release Version: 7.5 Git Commit Hash: 29dae6c0caaad7fe3f8d74ded151d2627253af01
/assign @asddongmen
It shows that the frontier consume a huge amount of cpu:
And log below can be found in the upstream TiKV:
> wc -l tikv-2024-05-08T15-09-29.950.log
6186489 tikv-2024-05-08T15-09-29.950.log
> grep "cdc send event failed, full" tikv-2024-05-08T15-09-29.950.log | wc -l
6186448
It seems that the issue arises because the number of tables and the resolvedTs value exceed the frontier's ability.
cc @hicqu To fix this issue, https://github.com/pingcap/tiflow/pull/10506 is needed cherry-pick to v7.5.2.
/severity major