bundle-kubeflow
bundle-kubeflow copied to clipboard
Charms to be integrated with grafana
This is a tracker issue to document progress with CKF charms that can be integrated with Grafana (aka provide a functional and useful grafana dashboard)
- [ ] admission-webhook
- [x] argo-controller There is a grafana dashboard that looks to be working well.
- [ ] dex
- [ ] envoy Some metrics in the dashboard are working. There are a handful of metrics configures there which I 'm not sure if all are needed in our envoy case. Issue https://github.com/canonical/envoy-operator/issues/73
- [ ] istio-gateway
- [ ] istio-pilot
- [ ] jupyter-controller Its dashboard doesn't seem to work. Exposes two metrics saying "No data".
- [ ] jupyter-ui
- [x] katib-controller There is a dashboard that shows current experiments and trials. However, it could be improved depending on what decision we will take about grafana dashboards and given that katib-controller provides adequate metrics.
- [ ] katib-db-manager
- [ ] katib-ui
- [ ] kfp-api
- [ ] kfp-metadata-writer
- [ ] kfp-persistence
- [ ] kfp-profile-controller
- [ ] kfp-schedwf
- [ ] kfp-ui
- [ ] kfp-viewer
- [ ] kfp-viz
- [ ] knative-eventing
- [ ] knative-operator
- [ ] knative-serving
- [ ] kserve-controller
- [ ] kubeflow-dashboard
- [ ] kubeflow-rofiles(kfam)
- [ ] kubeflow-roles
- [ ] kubeflow-volumes
- [ ] metacontroller
- [ ] minio There's a grafana dashbaord that only shows N/A under all metrics.
- [ ] mlmd
- [ ] oidc-gatekeeper
- [ ] pvcviewer-operator
- [ ] seldon-controller-manager The grafana dashboard doesn't seem to work. I applied two SeldonDeployments and the
Models
metric showed "No Data" - [ ] tensorboard-controller
- [ ] tensorboard-web-app
- [ ] training-operator
Update: There is also MLflow that its dashboard presents No data
.
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5364.
This message was autogenerated
I can confirm that jupyter notebooks, minio and seldon controller are missing data, while katib-controller and argo-controller work on the current deployment. If we could also include MLflow to that list: MLflow metrics Dashboard is also missing data.
MinIO
The issue there is that we have a label job=scrabe_jobs
which is making the panel not rendering anything. Also, metrics minio_cluster_capacity
is not provided by MinIO (anymore I guess).