Pyrra's UI isn't showing any data despite metrics being available
What's not working:
I've deployed Pyrra on top of our observability stack and it seems to be working as expected from the operator perspective (I get PrometheusRules generated for my ServiceLevelObjectives, I see those metrics available when I query Grafana for them). For some reason, the Pyrra UI shows no data on the SLO-specific details pages.
What I've found so far
I've created an SLO using the example from the Pyrra repo:
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
name: pyrra-connect-errors
namespace: monitoring
labels:
prometheus: k8s
role: alert-rules
spec:
target: '99'
window: 2w
description: Pyrra serves API requests with connect-go either via gRPC or HTTP.
indicator:
ratio:
errors:
metric: connect_server_requests_total{job="pyrra",code=~"aborted|unavailable|internal|unknown|unimplemented|dataloss"}
total:
metric: connect_server_requests_total{job="pyrra"}
grouping:
- service
- method
It generates the following PrometheusRule:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
annotations:
prometheus-operator-validated: "true"
creationTimestamp: "2025-01-21T21:07:34Z"
generation: 1
labels:
prometheus: k8s
role: alert-rules
name: pyrra-connect-errors
namespace: monitoring
ownerReferences:
- apiVersion: pyrra.dev/v1alpha1
controller: true
kind: ServiceLevelObjective
name: pyrra-connect-errors
uid: 5b16534a-04c3-43f8-a24d-97038c9d2474
resourceVersion: "252446336"
uid: 3131bfb1-6a6e-45d4-8104-b79f2901faff
spec:
groups:
- interval: 1m30s
name: pyrra-connect-errors-increase
rules:
- expr: sum by (code, method, service) (increase(connect_server_requests_total{job="pyrra"}[2w]))
labels:
job: pyrra
slo: pyrra-connect-errors
record: connect_server_requests:increase2w
- alert: SLOMetricAbsent
expr: absent(connect_server_requests_total{job="pyrra"}) == 1
for: 5m
labels:
job: pyrra
severity: critical
slo: pyrra-connect-errors
- interval: 30s
name: pyrra-connect-errors
rules:
- expr: sum by (method, service) (rate(connect_server_requests_total{code=~"aborted|unavailable|internal|unknown|unimplemented|dataloss",job="pyrra"}[3m]))
/ sum by (method, service) (rate(connect_server_requests_total{job="pyrra"}[3m]))
labels:
job: pyrra
slo: pyrra-connect-errors
record: connect_server_requests:burnrate3m
- expr: sum by (method, service) (rate(connect_server_requests_total{code=~"aborted|unavailable|internal|unknown|unimplemented|dataloss",job="pyrra"}[15m]))
/ sum by (method, service) (rate(connect_server_requests_total{job="pyrra"}[15m]))
labels:
job: pyrra
slo: pyrra-connect-errors
record: connect_server_requests:burnrate15m
- expr: sum by (method, service) (rate(connect_server_requests_total{code=~"aborted|unavailable|internal|unknown|unimplemented|dataloss",job="pyrra"}[30m]))
/ sum by (method, service) (rate(connect_server_requests_total{job="pyrra"}[30m]))
labels:
job: pyrra
slo: pyrra-connect-errors
record: connect_server_requests:burnrate30m
- expr: sum by (method, service) (rate(connect_server_requests_total{code=~"aborted|unavailable|internal|unknown|unimplemented|dataloss",job="pyrra"}[1h]))
/ sum by (method, service) (rate(connect_server_requests_total{job="pyrra"}[1h]))
labels:
job: pyrra
slo: pyrra-connect-errors
record: connect_server_requests:burnrate1h
- expr: sum by (method, service) (rate(connect_server_requests_total{code=~"aborted|unavailable|internal|unknown|unimplemented|dataloss",job="pyrra"}[3h]))
/ sum by (method, service) (rate(connect_server_requests_total{job="pyrra"}[3h]))
labels:
job: pyrra
slo: pyrra-connect-errors
record: connect_server_requests:burnrate3h
- expr: sum by (method, service) (rate(connect_server_requests_total{code=~"aborted|unavailable|internal|unknown|unimplemented|dataloss",job="pyrra"}[12h]))
/ sum by (method, service) (rate(connect_server_requests_total{job="pyrra"}[12h]))
labels:
job: pyrra
slo: pyrra-connect-errors
record: connect_server_requests:burnrate12h
- expr: sum by (method, service) (rate(connect_server_requests_total{code=~"aborted|unavailable|internal|unknown|unimplemented|dataloss",job="pyrra"}[2d]))
/ sum by (method, service) (rate(connect_server_requests_total{job="pyrra"}[2d]))
labels:
job: pyrra
slo: pyrra-connect-errors
record: connect_server_requests:burnrate2d
- alert: ErrorBudgetBurn
expr: connect_server_requests:burnrate3m{job="pyrra",slo="pyrra-connect-errors"}
> (14 * (1-0.99)) and connect_server_requests:burnrate30m{job="pyrra",slo="pyrra-connect-errors"}
> (14 * (1-0.99))
for: 1m0s
labels:
exhaustion: 1d
job: pyrra
long: 30m
severity: critical
short: 3m
slo: pyrra-connect-errors
- alert: ErrorBudgetBurn
expr: connect_server_requests:burnrate15m{job="pyrra",slo="pyrra-connect-errors"}
> (7 * (1-0.99)) and connect_server_requests:burnrate3h{job="pyrra",slo="pyrra-connect-errors"}
> (7 * (1-0.99))
for: 8m0s
labels:
exhaustion: 2d
job: pyrra
long: 3h
severity: critical
short: 15m
slo: pyrra-connect-errors
- alert: ErrorBudgetBurn
expr: connect_server_requests:burnrate1h{job="pyrra",slo="pyrra-connect-errors"}
> (2 * (1-0.99)) and connect_server_requests:burnrate12h{job="pyrra",slo="pyrra-connect-errors"}
> (2 * (1-0.99))
for: 30m0s
labels:
exhaustion: 1w
job: pyrra
long: 12h
severity: warning
short: 1h
slo: pyrra-connect-errors
- alert: ErrorBudgetBurn
expr: connect_server_requests:burnrate3h{job="pyrra",slo="pyrra-connect-errors"}
> (1 * (1-0.99)) and connect_server_requests:burnrate2d{job="pyrra",slo="pyrra-connect-errors"}
> (1 * (1-0.99))
for: 1h30m0s
labels:
exhaustion: 2w
job: pyrra
long: 2d
severity: warning
short: 3h
slo: pyrra-connect-errors
I can verify that the recording rules are created and contain data by querying them in Grafana:
When I access the Pyrra UI main page that lists the SLOs, I see the example SLO as the only thing in the list but it says there is no data.
If I click into the objective, I see incorrect or missing data:
If I look at the logs for the pyrra-api pod, I can see it making the following queries, first for the main page:
ALERTS{slo=~".+"}
sum by (service, method) (connect_server_requests:increase2w{job="pyrra",slo="pyrra-connect-errors"})
sum by (service, method) (connect_server_requests:increase2w{code=~"aborted|unavailable|internal|unknown|unimplemented|dataloss",job="pyrra",slo="pyrra-connect-errors"})
I don't see any data for the ALERTS query, but the other two return data just fine if I query them myself through Grafana.
For the objective-specific page, here are the queries logged:
((1 - 0.99) - (sum(connect_server_requests:increase2w{code=~"aborted|unavailable|internal|unknown|unimplemented|dataloss",job="pyrra",slo="pyrra-connect-errors"} or vector(0)) / sum(connect_server_requests:increase2w{job="pyrra",slo="pyrra-connect-errors"}))) / (1 - 0.99)
sum by (code) (rate(connect_server_requests_total{code=~"aborted|unavailable|internal|unknown|unimplemented|dataloss",job="pyrra"}[5m])) / scalar(sum(rate(connect_server_requests_total{job="pyrra"}[5m]))) > 0
sum by (code) (rate(connect_server_requests_total{job="pyrra"}[5m])) > 0
ALERTS{slo="pyrra-connect-errors"}
ALERTS{slo=~".+"}
sum by (service, method) (connect_server_requests:increase2w{job="pyrra",slo="pyrra-connect-errors"})
sum by (service, method) (connect_server_requests:increase2w{code=~"aborted|unavailable|internal|unknown|unimplemented|dataloss",job="pyrra",slo="pyrra-connect-errors"})
connect_server_requests:burnrate30m{job="pyrra",slo="pyrra-connect-errors"}
connect_server_requests:burnrate1h{job="pyrra",slo="pyrra-connect-errors"}
connect_server_requests:burnrate15m{job="pyrra",slo="pyrra-connect-errors"}
connect_server_requests:burnrate12h{job="pyrra",slo="pyrra-connect-errors"}
connect_server_requests:burnrate3h{job="pyrra",slo="pyrra-connect-errors"}
connect_server_requests:burnrate2d{job="pyrra",slo="pyrra-connect-errors"}
connect_server_requests:burnrate3m{job="pyrra",slo="pyrra-connect-errors"}
If I make these queries in Grafana, I see data for most or all of them. I'm just not seeing any data in the graphs in the Pyrra UI for the objective. Other things that seem strange in the UI:
- the "Availability: Errors 0, Total 1" (I'd expect that the total would be related to the query
connect_server_requests_total{job="pyrra"}defined in the SLO, which returns 166 requests across different services/methods when checked in Grafana). - the multirate burndown list showing "NaN" makes me think it's not getting the data I think it is from the above queries
For context, our setup:
- We have a central Mimir deployment that replaces Prometheus
- We have OpenTelemetry Collectors monitoring for Prometheus CRDs such as PodMonitors, ServiceMonitors, and PrometheusRules. These collectors are responsible for configuring and collecting everything, and we expect the common interface for services that expose metrics to be those Prometheus CRDs (no service or collector talks directly to Mimir natively)
- We query and visualize our metrics in a Grafana instance backed to Mimir.
graph TD;
otel-collector --> Mimir;
PodMonitor --> otel-collector;
ServiceMonitor --> otel-collector;
PrometheusRule --> otel-collector;
Pyrra --> ServiceMonitor;
Pyrra --> ServiceLevelObjective;
ServiceLevelObjective --> PrometheusRule;
Mimir --> Grafana;
I can report the same behaviour. Using main as image tag i deployed the version that can take a grafana-external-url. prometheusUrl: "http://mimir-distributed-gateway.monitoring-metrics.svc/prometheus" is set to the prom endpoint of mimir.
Opening up the links from the dashboard (that sends me to the explore page with the corresponding query) results in a valid graph. But those are not displayed on the pyrra dashboard.
If i take a look at the mimir-distributed-gateway i see the incoming requests from pyrra: pyrra-pod-ip - [07/Aug/2025:14:30:51 +0000] 200 "POST /prometheus/api/v1/query HTTP/1.1" 63 "-" "Go-http-client/1.1" "-" pyrra-pod-ip - [07/Aug/2025:14:30:51 +0000] 200 "POST /prometheus/api/v1/query_range HTTP/1.1" 63 "-" "Go-http-client/1.1" "-" pyrra-pod-ip - [07/Aug/2025:14:30:51 +0000] 200 "POST /prometheus/api/v1/query HTTP/1.1" 63 "-" "Go-http-client/1.1" "-"