numaflow
numaflow copied to clipboard
Numaflow Debuggability
Summary
Currently, the numaflow UI offers real-time status updates at various levels, including vertex, pod, and container levels with enough context. To enhance debuggability, we should extend this capability to include more comprehensive and detailed metrics with historical data. By analyzing the metrics over time, users can detect trends, identify anomalies, and determine the root cause of issues.
Tasks
- [ ] unify metrics to avoid duplicate metrics
- [x] test querying our existing metrics with a metrics server supporting PromQL
- [ ] #2058
- [x] finalise a charting library to render graphs
- [ ] #2104
- [ ] #2107
- [ ] #2105
- [ ] #2106
Why do we need a prometheus server in Numaflow? Can't we assume/mandate that a metrics provider that supports PromQL is provided?
Why do we need a prometheus server in Numaflow? Can't we assume/mandate that a metrics provider that supports PromQL is provided?
We do not need this. Supported metrics provider will be a mandate. This issue is just about testing promql query by running a prometheus server. Maybe we can rename it?
Closing as all sub-issues are completed.
thank you, @veds-g for driving it