tiflow The resolvedTs of a changefeed get stuck during initialization if there are more than 400k resolvedTs per second sending from upstream.

The resolvedTs of a changefeed get stuck during initialization if there are more than 400k resolvedTs per second sending from upstream.

Open fubinzh opened this issue 9 months ago • 4 comments

What did you do?

TiDB cluster with 8.1 TiKV and 7.5 CDC
create kafka simple protocol changefeed
no workload running

What did you expect to see?

CDC resolved ts should be normal

What did you see instead?

CDC resolved ts stucks Uploading image.png…

Versions of the cluster

TiKV version Release Version: 8.1.0 Git Commit Hash: 56613f7c3e28c02853cc51d15bc1b77f68b58be8

CDC version: Release Version: 7.5 Git Commit Hash: 29dae6c0caaad7fe3f8d74ded151d2627253af01

May 09 '24 10:05 fubinzh

/assign @asddongmen

May 09 '24 10:05 fubinzh

It shows that the frontier consume a huge amount of cpu: img_v3_02an_f2a63726-5d98-47f6-b129-6480c450248g

And log below can be found in the upstream TiKV:

> wc -l tikv-2024-05-08T15-09-29.950.log
 6186489 tikv-2024-05-08T15-09-29.950.log
> grep "cdc send event failed, full" tikv-2024-05-08T15-09-29.950.log | wc -l
 6186448

It seems that the issue arises because the number of tables and the resolvedTs value exceed the frontier's ability.

May 10 '24 03:05 asddongmen

cc @hicqu To fix this issue, https://github.com/pingcap/tiflow/pull/10506 is needed cherry-pick to v7.5.2.

May 13 '24 03:05 asddongmen

/severity major

May 13 '24 12:05 fubinzh

tiflow tiflow copied to clipboard

The resolvedTs of a changefeed get stuck during initialization if there are more than 400k resolvedTs per second sending from upstream.

What did you do?

What did you expect to see?

What did you see instead?

Versions of the cluster

tiflow
tiflow copied to clipboard