tiflow icon indicating copy to clipboard operation
tiflow copied to clipboard

The resolvedTs of a changefeed get stuck during initialization if there are more than 400k resolvedTs per second sending from upstream.

Open fubinzh opened this issue 9 months ago • 4 comments

What did you do?

  1. TiDB cluster with 8.1 TiKV and 7.5 CDC
  2. create kafka simple protocol changefeed
  3. no workload running

What did you expect to see?

CDC resolved ts should be normal

What did you see instead?

CDC resolved ts stucks image Uploading image.png…

Versions of the cluster

TiKV version Release Version: 8.1.0 Git Commit Hash: 56613f7c3e28c02853cc51d15bc1b77f68b58be8

CDC version: Release Version: 7.5 Git Commit Hash: 29dae6c0caaad7fe3f8d74ded151d2627253af01

fubinzh avatar May 09 '24 10:05 fubinzh

/assign @asddongmen

fubinzh avatar May 09 '24 10:05 fubinzh

It shows that the frontier consume a huge amount of cpu: img_v3_02an_f2a63726-5d98-47f6-b129-6480c450248g

And log below can be found in the upstream TiKV:

> wc -l tikv-2024-05-08T15-09-29.950.log
 6186489 tikv-2024-05-08T15-09-29.950.log
> grep "cdc send event failed, full" tikv-2024-05-08T15-09-29.950.log | wc -l
 6186448

It seems that the issue arises because the number of tables and the resolvedTs value exceed the frontier's ability.

asddongmen avatar May 10 '24 03:05 asddongmen

cc @hicqu To fix this issue, https://github.com/pingcap/tiflow/pull/10506 is needed cherry-pick to v7.5.2.

asddongmen avatar May 13 '24 03:05 asddongmen

/severity major

fubinzh avatar May 13 '24 12:05 fubinzh