flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[BUG] Flyte user dashboard metric name mismatches

Open davidmirror-ops opened this issue 6 months ago • 4 comments

Describe the bug

The Flyte User dashboard published on Grafana Marketplace has the following issues as reported by the community:

  • flyte:propeller:all:workflow:accepted - no data
  • flyte:propeller:all:workflow:success_duration_ms_count - needed to be flyte:propeller:all:workflow:event_recording:success_duration_ms_count
  • flyte:propeller:all:workflow:failure_duration_ms_count - no data. I'm able to visualise failed tasks instead of workflows using flyte:propeller:all:task:event_recording:failure_duration_ms_count
  • flyte:propeller:all:workflow:workflow_aborted - no data
  • success/failure/queueing time by quantile, and User VS System errors - no data unless I use the 'unlabeled_ms' version of the metrics, which doesn't allow us to filter by project/domain/workflow
  • CPU/Memory limits VS quota - no 'kube_resourcequota' metric found in our prometheus setup, but maybe this is unique to our setup. I was able to more-or-less recreate these visualisations using our own cluster prometheus metrics
  • Pending tasks - not clear if this works, only one data point visualised (but we've been testing across multiple workflows) CPU/Memory Usage Percentage - infinite loading

Expected behavior

The dashboards should not only work OOB but should be better documented in terms of metric explanations and expected behavior. The published dashboards should reflect those updates

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • [X] Yes

Have you read the Code of Conduct?

  • [X] Yes

davidmirror-ops avatar Aug 19 '24 12:08 davidmirror-ops