promql-engine icon indicating copy to clipboard operation
promql-engine copied to clipboard

Fix correctness issue in predict_linear with step invariant

Open harry671003 opened this issue 9 months ago • 5 comments

Issue

Our continuous correctness tests found an issue with predict_linear with step invariant matrix selector.

Eg: predict_linear({__name__="http_requests_total", pod!~"nginx-1"}[5m] @ start(), -0.37690610678629094)

This PR addresses the problem by allowing the matrixScanner to act in an invariant way similar to Prometheus engine. See: https://github.com/prometheus/prometheus/blob/2a5ed8b8a55fecaa79236ef4adb9f0b82b34587c/promql/engine.go#L1788

harry671003 avatar Mar 17 '25 16:03 harry671003

Can we just wrap the matrix function into step invariant operator? https://github.com/thanos-io/promql-engine/blob/main/execution/step_invariant/step_invariant.go#L47 ?

MichaHoffmann avatar Mar 19 '25 17:03 MichaHoffmann

We could just push step inviarnace up in a preprocessor like here: https://github.com/thanos-io/promql-engine/blob/main/logicalplan/plan.go#L335

MichaHoffmann avatar Mar 19 '25 17:03 MichaHoffmann

We could just push step inviarnace up in a preprocessor like here

Can we just wrap the matrix function into step invariant operator?

The PromQL parser parses the query into: Screenshot 2025-03-19 at 10 51 22 AM

After calling promql.PreprocessExpr() in plan.go this becomes: Screenshot 2025-03-19 at 10 52 11 AM

In Prometheus, predict_linear is marked as at modifier unsafe. So it cannot be wrapped with StepInvariance: https://github.com/prometheus/prometheus/blob/308c8c48c15c74a929c430447df3d3c3a3d4001f/promql/functions.go#L1914

harry671003 avatar Mar 19 '25 17:03 harry671003

I see, I wonder if we maybe should jsut fall back for now for correctness sake - this seems to be fairly niche usecase that we must weigh against added complexity

MichaHoffmann avatar Mar 19 '25 20:03 MichaHoffmann

I see, I wonder if we maybe should jsut fall back for now for correctness sake - this seems to be fairly niche usecase that we must weigh against added complexity

I'm okay with doing the fallback. There is one concern, we'll have to exclude the functions.test file from acceptance tests. If that is okay, I can create a PR.

harry671003 avatar Mar 21 '25 17:03 harry671003