numaflow icon indicating copy to clipboard operation
numaflow copied to clipboard

Numaflow Debuggability

Open veds-g opened this issue 1 year ago • 2 comments

Summary

Currently, the numaflow UI offers real-time status updates at various levels, including vertex, pod, and container levels with enough context. To enhance debuggability, we should extend this capability to include more comprehensive and detailed metrics with historical data. By analyzing the metrics over time, users can detect trends, identify anomalies, and determine the root cause of issues.

Tasks

  • [ ] unify metrics to avoid duplicate metrics
  • [x] test querying our existing metrics with a metrics server supporting PromQL
  • [ ] #2058
  • [x] finalise a charting library to render graphs
  • [ ] #2104
  • [ ] #2107
  • [ ] #2105
  • [ ] #2106

veds-g avatar Sep 12 '24 04:09 veds-g

Why do we need a prometheus server in Numaflow? Can't we assume/mandate that a metrics provider that supports PromQL is provided?

vigith avatar Sep 12 '24 13:09 vigith

Why do we need a prometheus server in Numaflow? Can't we assume/mandate that a metrics provider that supports PromQL is provided?

We do not need this. Supported metrics provider will be a mandate. This issue is just about testing promql query by running a prometheus server. Maybe we can rename it?

veds-g avatar Sep 12 '24 14:09 veds-g

Closing as all sub-issues are completed.

veds-g avatar Apr 12 '25 13:04 veds-g

thank you, @veds-g for driving it

vigith avatar Apr 12 '25 16:04 vigith