tiflow
tiflow copied to clipboard
Observability (cdc) improve observability of changfeeds
This is a task tracking issue
How to define the observability of a changfeed
Recently one of CDC user encountered a sink-to-mysql stuck issue, we can't locate even guess the root cause from existing metrics/grafana. So I walked through the sink-to-mysql code comprehensively, and found a potential stuck issue https://github.com/pingcap/tiflow/issues/10334 and workload skew issue https://github.com/pingcap/tiflow/issues/10341 . But we have no way to judge if the user's issue is what we have found. We need a better end to end observability for different changfeeds. For example we need to know how many events/messages cached/buffered/piled up in each stage/step, we need to know the longest wait duration for each stage/step, etc.
Task Lists (TBD)
TiKV CDC
- [x] https://github.com/tikv/tikv/issues/16282
- [ ] https://github.com/tikv/tikv/issues/16390
- [x] https://github.com/pingcap/tiflow/issues/10354
KvClient and Puller
- [x] https://github.com/pingcap/tiflow/issues/10388
Sorter
Sink
- [ ] Sink to MySQL
- [ ] https://github.com/pingcap/tiflow/issues/10344
- [ ] Sink to Cloud Storage
- [ ] Sink to MQ (Kafka, Pulsar)
- [ ] Simple Protocol
Other
- [x] https://github.com/pingcap/tiflow/issues/10449
- [x] https://github.com/pingcap/tiflow/issues/10447
- [x] https://github.com/pingcap/tiflow/issues/10438
- [ ] #10905