core icon indicating copy to clipboard operation
core copied to clipboard

Kebechet overview dashboard and web page metrics

Open pacospace opened this issue 3 years ago • 16 comments

Is your feature request related to a problem? Please describe. As User of Kebechet,

I would like to have an overview of the use of Kebechet:

As Maintaer of Kebechet,

I would like to have a look at a dashboard with all metrics related to Kebechet

High-level Goals

  • Have a dashboard with all Kebechet metrics (usage and operational + SLI/SLO).
  • Fill landing page with all required data

Describe the solution you'd like

Collect the following metrics:

Usage

  • [x] Number of repo maintained (https://github.com/thoth-station/metrics-exporter/blob/2d8dbaf7a241271758f5a65294f689f423614aff/thoth/metrics_exporter/jobs/kebechet.py#L40)
  • [x] Total number of stacks maintained (users*runtime_environments)
  • [x] Number of users per manager (https://github.com/thoth-station/metrics-exporter/blob/2d8dbaf7a241271758f5a65294f689f423614aff/thoth/metrics_exporter/jobs/kebechet.py#L48)
  • [x] Number of new users per week (considering slug for now) (https://github.com/thoth-station/slo-reporter/blob/0479750edfbff7a8913ce667da14e1ea62015119/thoth/slo_reporter/sli_thoth_services/sli_kebechet.py#L133)
  • [ ] Number of issues opened/closed/merged/rejected (version manager)
  • [ ] Average time to merge PRs (version manager)
  • [ ] Number of PR opened (thoth-advise manager)
  • [ ] Number of PR approved/merged/closed (thoth-advise manager)

Operational

  • [ ] Latency from opening the issue to merging/closing the issue (version manager)
  • [ ] thoth-station/kebechet#993
  • [x] workflows run (success/failed/errors per manager)
    • [x] update/thoth-advise number of success/failures (number of successful adviser per source_type=KEBECHET)
    • [x] Kebechet-run results step (success/failed/errors per manager)
    • [x] Kebechet step (success/failed/errors per manager)
  • [x] number of managers
  • [x] Number of messages sent by internal trigger per type of trigger (https://github.com/thoth-station/workflow-helpers/blob/2bdc95e71bbd96aa73fb4f997f75b000ac37fb81/kebechet_administrator.py#L60)
  • [x] Rate limit (https://github.com/thoth-station/metrics-exporter/blob/2d8dbaf7a241271758f5a65294f689f423614aff/thoth/metrics_exporter/jobs/kebechet.py#L61)
  • [x] Number of purged runtime environment, number of repos with the runtime environment, number of issues opened for that runtime environment.

The above metrics should be combined to allow the managers to set the following SLO:

  • [ ] Manager x is used above certain %
  • [ ] Manager x is providing results in s
  • [ ] Manager x is succeeding certain %
  • [ ] % of PRs opened by thoth advise manager are merged

Additional context

  • metrics-exporter https://github.com/thoth-station/metrics-exporter
  • slo-reporter https://github.com/thoth-station/slo-reporter
  • mi https://github.com/thoth-station/mi

Acceptance Criteria

  • [x] Dashboard with Kebechet metrics is available in Operate First.
  • [ ] kebechet landing page contains all data (https://goern.github.io/kebechet-universe/)

cc @KPostOffice @xtuchyna

pacospace avatar Aug 20 '21 10:08 pacospace

Will this provide all the data required to fill in the blanks at https://goern.github.io/kebechet-universe/ ?

goern avatar Aug 20 '21 10:08 goern

Will this provide all the data required to fill in the blanks at https://goern.github.io/kebechet-universe/ ?

Yes it is part of the acceptance criteria!

pacospace avatar Aug 20 '21 10:08 pacospace

Related-To: https://github.com/thoth-station/slo-reporter/issues/210

pacospace avatar Aug 24 '21 12:08 pacospace

@KPostOffice @xtuchyna any update on this?!

goern avatar Sep 13 '21 06:09 goern

sorry for delay, preparing test for daily data&metrics aggregation https://github.com/thoth-station/thoth-application/pull/1954

xtuchyna avatar Sep 14 '21 14:09 xtuchyna

Related-To: https://github.com/thoth-station/kebechet/issues/679 https://github.com/thoth-station/kebechet/issues/546

pacospace avatar Sep 15 '21 13:09 pacospace

ping, any decision on this?

goern avatar Oct 18 '21 15:10 goern

ping, any decision on this?

We are working on it with @hemajv :) we are waiting also for grafana to be back in the clusters, cc @harshad16

pacospace avatar Oct 18 '21 15:10 pacospace

/lifecycle active

codificat avatar Dec 01 '21 15:12 codificat

We are working on it

/triage accepted

codificat avatar Dec 21 '21 09:12 codificat

The initial dashboard is available at: https://grafana.operate-first.cloud/d/bBFI9MJnk/kebechet-monitoring?orgId=1 cc @hemajv

pacospace avatar Jan 31 '22 12:01 pacospace

We need to extend dashboard with Kebechet github metrics cc @xtuchyna

pacospace avatar Jan 31 '22 13:01 pacospace

Waiting for https://github.com/thoth-station/thoth-application/issues/2333 to be resolved

xtuchyna avatar Feb 08 '22 15:02 xtuchyna

Hey @pacospace, I added a comment to an issue here: https://github.com/thoth-station/kebechet/issues/825. I feel like it doesn't fit here as it is more of an operational metric, but I figured I'd link it in case it is something we might want to include.

KPostOffice avatar Mar 04 '22 21:03 KPostOffice

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

sesheta avatar Jul 07 '22 10:07 sesheta

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

sesheta avatar Aug 06 '22 12:08 sesheta