tiflow icon indicating copy to clipboard operation
tiflow copied to clipboard

TiKV leader election instability causing TiCDC initialization failures

Open verbotenj opened this issue 6 months ago • 1 comments

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Set up a TiDB cluster with version 8.1.1 and integrate it with Kafka version 3.7.1. Use DM to copy DB from Mysql to TIDB. Configure TiCDC for data replication between TiDB and Kafka. Monitor the TiKV logs during normal operation, specifically looking for leader election and region management activities and TiCDC.

2. What did you expect to see? (Required)

Stable operation of the TiKV service with successful leader elections and consistent region management. TiCDC should initialize and operate without errors, ensuring uninterrupted data replication to Kafka.

3. What did you see instead (Required)

  • Frequent TiKV errors related to leader election instability for a specific region.
    • The TiKV logs contain several errors related to CDC initialization failures. Specifically, there are entries such as cdc initialize fail: Request error message: peer is not leader for this region, leader may None not_leader
  • The CDC process encountered region errors and stopped observing certain regions due to failed leader elections. For example:
    • the log entry [INFO] [delegate.rs:1034] ["cdc stop observing"] [failed=true] [region]
  • TiCDC: "code": "CDC:ErrProcessorUnknown",

4. What is your TiDB version? (Required)

| Release Version: v8.1.1 Edition: Community Git Commit Hash: a7df4f9845d5d6a590c5d45dad0dcc9f21aa8765 Git Branch: HEAD UTC Build Time: 2024-08-22 05:49:03 GoVersion: go1.21.13 Race Enabled: false Check Table Before Drop: false Store: tikv |

verbotenj avatar Aug 28 '24 01:08 verbotenj