pyrra icon indicating copy to clipboard operation
pyrra copied to clipboard

Pyrra `ListObjectives` route returns 500 if SLO is created with invalid metrics

Open ArthurSens opened this issue 2 years ago • 1 comments

We tried creating an SLO with the following spec:

spec:
  target: "99.9"
  description: "Success ratio of workspace backups"
  window: 4w
  indicator:
    ratio:
      errors:
        metric: gitpod_ws_manager_workspace_backups_failure_total
      total:
        metric: (gitpod_ws_manager_workspace_backups_failure_total + gitpod_ws_manager_workspace_backups_success_total)

We made a mistake here when we assumed that a query could work instead of a single metric.

The problem is that the admission controller accepted the SLO, and after that all other SLOs we had stopped showing up in the ListObjectives route. We got confused at first, but after sometime we noticed the 500s showing up in the logs. We deleted this problematic SLO and 500s disappeared.


Accepting queries instead of a specific metric might be reasonable in some use cases, but that is not the point of this issue 😅. I believe it would be a better experience if the admission controller rejected the SLO during creation time, or if Pyrra UI could handle invalid SLOs without returning 500s.

ArthurSens avatar Jul 05 '22 22:07 ArthurSens

Thanks for reporting!

Pyrra crashing is far from ideal. Thankfully, once the recording rules are loaded the alerting should continue, even if Pyrra crashes. So it's not as bad as it might sound first.

Pyrra should handle both scenarios gracefully.

metalmatze avatar Jul 06 '22 21:07 metalmatze