changing burnrate to error_rate in record rule
Usually we think burn rate issli_error_rate / (1-sli_slo) , but pyrra record is more like sli error rate only.
For example:
record: prometheus_http_requests:burnrate1m
expr: sum(rate(prometheus_http_requests_total{code=~"5..",handler="/api/v1/query"}[1m])) / sum(rate(prometheus_http_requests_total{handler="/api/v1/query"}[1m]))
labels:
handler: /api/v1/query
slo: prometheus-api-query
How about changing prometheus_http_requests:burnrate1m to prometheus_http_requests:error_rate1m, same to other slis.
Not sure I understand the question correctly? Is this about changing the name of the recording rules? I don't think anything is wrong with the naming here.
Not sure I understand the question correctly? Is this about changing the name of the recording rules? I don't think anything is wrong with the naming here.
Yes, because by definition in Google's SRE Workbook, burn rate is how fast, relative to the SLO, the service consumes the error budget.
The recording rules pyrra is currently using are not relative to the SLO, but an absolute value of error rate.
Interesting, you might be on to something. I'll read up on the topic once more and see how we can improve things for Pyrra. For now, I don't think we should rename all recording rules, even if not exactly right, to not break all users until we have a clear path forward.
Coming up with recording and alerting rules relative to the SLO would be great!
yup, the downward compatibility of indicators is a very important issue, especially used dashboard templates
yup, the downward compatibility of indicators is a very important issue, especially used dashboard templates
I think this could be reviewed before releasing v1.0, which in my expectations would set things in stone.