tiflow icon indicating copy to clipboard operation
tiflow copied to clipboard

Observability (cdc) improve observability of changfeeds

Open zhangjinpeng87 opened this issue 1 year ago • 0 comments

This is a task tracking issue

How to define the observability of a changfeed

Recently one of CDC user encountered a sink-to-mysql stuck issue, we can't locate even guess the root cause from existing metrics/grafana. So I walked through the sink-to-mysql code comprehensively, and found a potential stuck issue https://github.com/pingcap/tiflow/issues/10334 and workload skew issue https://github.com/pingcap/tiflow/issues/10341 . But we have no way to judge if the user's issue is what we have found. We need a better end to end observability for different changfeeds. For example we need to know how many events/messages cached/buffered/piled up in each stage/step, we need to know the longest wait duration for each stage/step, etc.

Task Lists (TBD)

TiKV CDC

  • [x] https://github.com/tikv/tikv/issues/16282
  • [ ] https://github.com/tikv/tikv/issues/16390
  • [x] https://github.com/pingcap/tiflow/issues/10354

KvClient and Puller

  • [x] https://github.com/pingcap/tiflow/issues/10388

Sorter

Sink

  • [ ] Sink to MySQL
    • [ ] https://github.com/pingcap/tiflow/issues/10344
  • [ ] Sink to Cloud Storage
  • [ ] Sink to MQ (Kafka, Pulsar)
    • [ ] Simple Protocol

Other

  • [x] https://github.com/pingcap/tiflow/issues/10449
  • [x] https://github.com/pingcap/tiflow/issues/10447
  • [x] https://github.com/pingcap/tiflow/issues/10438
  • [ ] #10905

zhangjinpeng87 avatar Dec 21 '23 18:12 zhangjinpeng87