flyte
flyte copied to clipboard
[BUG] Flyte user dashboard metric name mismatches
Describe the bug
The Flyte User dashboard published on Grafana Marketplace has the following issues as reported by the community:
- flyte:propeller:all:workflow:accepted - no data
- flyte:propeller:all:workflow:success_duration_ms_count - needed to be flyte:propeller:all:workflow:event_recording:success_duration_ms_count
- flyte:propeller:all:workflow:failure_duration_ms_count - no data. I'm able to visualise failed tasks instead of workflows using flyte:propeller:all:task:event_recording:failure_duration_ms_count
- flyte:propeller:all:workflow:workflow_aborted - no data
- success/failure/queueing time by quantile, and User VS System errors - no data unless I use the 'unlabeled_ms' version of the metrics, which doesn't allow us to filter by project/domain/workflow
- CPU/Memory limits VS quota - no 'kube_resourcequota' metric found in our prometheus setup, but maybe this is unique to our setup. I was able to more-or-less recreate these visualisations using our own cluster prometheus metrics
- Pending tasks - not clear if this works, only one data point visualised (but we've been testing across multiple workflows) CPU/Memory Usage Percentage - infinite loading
Expected behavior
The dashboards should not only work OOB but should be better documented in terms of metric explanations and expected behavior. The published dashboards should reflect those updates
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
- [X] Yes
Have you read the Code of Conduct?
- [X] Yes