tiflow icon indicating copy to clipboard operation
tiflow copied to clipboard

etcd client outCh blocking too long when pause changefeed

Open CharlesCheung96 opened this issue 1 year ago • 1 comments

What did you do?

  1. Create changefeed with pulsar sink
  2. Stop the pulsar server
  3. Delete some data from upstream
  4. Pause the changefeed

What did you expect to see?

No response

What did you see instead?

[2024/02/20 20:21:00.048 +08:00] [WARN] [client.go:272] ["etcd client outCh blocking too long, the etcdWorker may be stuck"] [duration=14m2.000284532s] [role=processor]
[2024/02/20 20:21:00.406 +08:00] [WARN] [client.go:272] ["etcd client outCh blocking too long, the etcdWorker may be stuck"] [duration=8m16.000233667s] [role=owner]

ticdc.log

goroutine.log

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

(paste TiDB cluster version here)

Upstream TiKV version (execute tikv-server --version):

(paste TiKV version here)

TiCDC version (execute cdc version):

2eadc08f4bd64d00250e9ce6f7c69eda5498464c

CharlesCheung96 avatar Feb 20 '24 12:02 CharlesCheung96

This is because the close method of pulsar client and pulsar producer is blocked when downstream pulsar server is down. A possible fix is to spawn a goroutine to close pulsar client and producer.

asddongmen avatar Feb 21 '24 09:02 asddongmen