opni icon indicating copy to clipboard operation
opni copied to clipboard

SLO Preview API

Open alexandreLamarre opened this issue 1 year ago • 1 comments

SLO preview API

alexandreLamarre avatar Aug 17 '22 14:08 alexandreLamarre

In a kubernetes environment there seems to be a bug with timestamppb.Timestamp queries in cortex :

54596-08-16T00:00:00Z

is trying to query the year 54596 instead of the current year in the kubernetes environment.

Local cortex works just fine.

alexandreLamarre avatar Aug 17 '22 14:08 alexandreLamarre

Alert queries may be hitting a fetch block limit or series involved limit or a query length limit from cortex silently, so I'm working on decomposing these raw queries into their smaller query counterparts and evaluating the logical components or / and within the api call

alert query

(
    max(
        1 - 
        (
            (
                sum(
                    rate(prometheus_http_requests_total{job="prometheus",code=~"200"}[5m])
                    )
            )
            /(
                sum(
                    rate(prometheus_http_requests_total{job="prometheus",code=~"200|500|503"}[5m])
                    )
            )
        ) > 
        (0.001667 * 0.010000)
    )  and 
    max(
        1 - 
        (
            (
                sum(
                    rate(prometheus_http_requests_total{job="prometheus",code=~"200"}[30m])
                    )
            )
            /(
                sum(
                    rate(prometheus_http_requests_total{job="prometheus",code=~"200|500|503"}[30m])
                    )
            )
        ) > 
        (0.000694 * 0.010000)
    ) 
) or 
(
    max(
        1 - 
        (
            (
                sum(
                    rate(prometheus_http_requests_total{job="prometheus",code=~"200"}[2h])
                    )
            )
            /(
                sum(
                    rate(prometheus_http_requests_total{job="prometheus",code=~"200|500|503"}[2h])
                    )
            )
        ) 
        > 
        (0.001667 * 0.010000)
    )  and 
    max(
        1 - 
        (
            (
                sum(
                    rate(prometheus_http_requests_total{job="prometheus",code=~"200"}[6h])
                    )
            )
            /(
                sum(
                    rate(prometheus_http_requests_total{job="prometheus",code=~"200|500|503"}[6h])
                    )
            )
        ) > 
        (0.000694 * 0.010000)
    ) 
)

Edit : Fixed this by adding the bool modifier to the > comparisions, if the condition wasn't true before, it would return empty, which is not ideal for getting a range of windows.

alexandreLamarre avatar Aug 18 '22 15:08 alexandreLamarre

Ideally, i'd like to have a timeseries with already a month's worth of data to create SLOs with when I demo/test this ---- but Cortex/prometheus does not really support backfilling data easily. It is really only intended to ingest data that is up to an hour old.

  • https://github.com/cortexproject/cortex/issues/2366-
  • https://grafana.com/blog/2020/09/02/how-were-improving-backfill-methods-to-get-older-data-into-prometheus/

I haven't had any success with any of the methods outlined above

alexandreLamarre avatar Aug 18 '22 16:08 alexandreLamarre