core
core copied to clipboard
Kebechet overview dashboard and web page metrics
Is your feature request related to a problem? Please describe. As User of Kebechet,
I would like to have an overview of the use of Kebechet:
As Maintaer of Kebechet,
I would like to have a look at a dashboard with all metrics related to Kebechet
High-level Goals
- Have a dashboard with all Kebechet metrics (usage and operational + SLI/SLO).
- Fill landing page with all required data
Describe the solution you'd like
Collect the following metrics:
Usage
- [x] Number of repo maintained (https://github.com/thoth-station/metrics-exporter/blob/2d8dbaf7a241271758f5a65294f689f423614aff/thoth/metrics_exporter/jobs/kebechet.py#L40)
- [x] Total number of stacks maintained (users*runtime_environments)
- [x] Number of users per manager (https://github.com/thoth-station/metrics-exporter/blob/2d8dbaf7a241271758f5a65294f689f423614aff/thoth/metrics_exporter/jobs/kebechet.py#L48)
- [x] Number of new users per week (considering slug for now) (https://github.com/thoth-station/slo-reporter/blob/0479750edfbff7a8913ce667da14e1ea62015119/thoth/slo_reporter/sli_thoth_services/sli_kebechet.py#L133)
- [ ] Number of issues opened/closed/merged/rejected (version manager)
- [ ] Average time to merge PRs (version manager)
- [ ] Number of PR opened (thoth-advise manager)
- [ ] Number of PR approved/merged/closed (thoth-advise manager)
Operational
- [ ] Latency from opening the issue to merging/closing the issue (version manager)
- [ ] thoth-station/kebechet#993
- [x] workflows run (success/failed/errors per manager)
-
- [x] update/thoth-advise number of success/failures (number of successful adviser per source_type=KEBECHET)
-
- [x] Kebechet-run results step (success/failed/errors per manager)
-
- [x] Kebechet step (success/failed/errors per manager)
- [x] number of managers
- [x] Number of messages sent by internal trigger per type of trigger (https://github.com/thoth-station/workflow-helpers/blob/2bdc95e71bbd96aa73fb4f997f75b000ac37fb81/kebechet_administrator.py#L60)
- [x] Rate limit (https://github.com/thoth-station/metrics-exporter/blob/2d8dbaf7a241271758f5a65294f689f423614aff/thoth/metrics_exporter/jobs/kebechet.py#L61)
- [x] Number of purged runtime environment, number of repos with the runtime environment, number of issues opened for that runtime environment.
The above metrics should be combined to allow the managers to set the following SLO:
- [ ] Manager x is used above certain %
- [ ] Manager x is providing results in s
- [ ] Manager x is succeeding certain %
- [ ] % of PRs opened by thoth advise manager are merged
Additional context
- metrics-exporter https://github.com/thoth-station/metrics-exporter
- slo-reporter https://github.com/thoth-station/slo-reporter
- mi https://github.com/thoth-station/mi
Acceptance Criteria
- [x] Dashboard with Kebechet metrics is available in Operate First.
- [ ] kebechet landing page contains all data (https://goern.github.io/kebechet-universe/)
cc @KPostOffice @xtuchyna
Will this provide all the data required to fill in the blanks at https://goern.github.io/kebechet-universe/ ?
Will this provide all the data required to fill in the blanks at https://goern.github.io/kebechet-universe/ ?
Yes it is part of the acceptance criteria!
Related-To: https://github.com/thoth-station/slo-reporter/issues/210
@KPostOffice @xtuchyna any update on this?!
sorry for delay, preparing test for daily data&metrics aggregation https://github.com/thoth-station/thoth-application/pull/1954
Related-To: https://github.com/thoth-station/kebechet/issues/679 https://github.com/thoth-station/kebechet/issues/546
ping, any decision on this?
ping, any decision on this?
We are working on it with @hemajv :) we are waiting also for grafana to be back in the clusters, cc @harshad16
/lifecycle active
We are working on it
/triage accepted
The initial dashboard is available at: https://grafana.operate-first.cloud/d/bBFI9MJnk/kebechet-monitoring?orgId=1 cc @hemajv
We need to extend dashboard with Kebechet github metrics cc @xtuchyna
Waiting for https://github.com/thoth-station/thoth-application/issues/2333 to be resolved
Hey @pacospace, I added a comment to an issue here: https://github.com/thoth-station/kebechet/issues/825. I feel like it doesn't fit here as it is more of an operational metric, but I figured I'd link it in case it is something we might want to include.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten