tiflow Observability (cdc) improve observability of changfeeds

Observability (cdc) improve observability of changfeeds

Open zhangjinpeng87 opened this issue 1 year ago • 0 comments

This is a task tracking issue

How to define the observability of a changfeed

Recently one of CDC user encountered a sink-to-mysql stuck issue, we can't locate even guess the root cause from existing metrics/grafana. So I walked through the sink-to-mysql code comprehensively, and found a potential stuck issue https://github.com/pingcap/tiflow/issues/10334 and workload skew issue https://github.com/pingcap/tiflow/issues/10341 . But we have no way to judge if the user's issue is what we have found. We need a better end to end observability for different changfeeds. For example we need to know how many events/messages cached/buffered/piled up in each stage/step, we need to know the longest wait duration for each stage/step, etc.

Task Lists (TBD)

TiKV CDC

[x] https://github.com/tikv/tikv/issues/16282
[ ] https://github.com/tikv/tikv/issues/16390
[x] https://github.com/pingcap/tiflow/issues/10354

KvClient and Puller

[x] https://github.com/pingcap/tiflow/issues/10388

Sorter

Sink

[ ] Sink to MySQL
- [ ] https://github.com/pingcap/tiflow/issues/10344
[ ] Sink to Cloud Storage
[ ] Sink to MQ (Kafka, Pulsar)
- [ ] Simple Protocol

Other

[x] https://github.com/pingcap/tiflow/issues/10449
[x] https://github.com/pingcap/tiflow/issues/10447
[x] https://github.com/pingcap/tiflow/issues/10438
[ ] #10905

Dec 21 '23 18:12 zhangjinpeng87

tiflow tiflow copied to clipboard

Observability (cdc) improve observability of changfeeds

How to define the observability of a changfeed

Task Lists (TBD)

TiKV CDC

KvClient and Puller

Sorter

Sink

Other

tiflow
tiflow copied to clipboard