promql-engine Investigate random fuzzing failures

Run make fuzz locally and observe failures such as:

--- FAIL: FuzzEnginePromQLSmithInstantQuery (27.14s)
    --- FAIL: FuzzEnginePromQLSmithInstantQuery (0.46s)
        enginefuzz_test.go:450: load 30s
                                http_requests_total{pod="nginx-1", route="/"} 120.00+11.00x40
                                http_requests_total{pod="nginx-2", route="/"} -35+2.00x40
        enginefuzz_test.go:452: sum by (pod, route) (
              count(
                sinh(
                  asin(
                    (
                        avg({__name__="http_requests_total"} @ start())
                      and
                        absent(sort({__name__="http_requests_total",route="/"} offset 2m31s))
                    )
                  )
                )
              )
            )
        enginefuzz_test.go:454: case 21 error mismatch.
            new result: 
            old result: {} => 1 @[7000]
        enginefuzz_test.go:459: failed 1 test cases

Or

--- FAIL: FuzzEnginePromQLSmithInstantQuery (0.43s)
    --- FAIL: FuzzEnginePromQLSmithInstantQuery (0.21s)
        enginefuzz_test.go:450: load 30s
                                http_requests_total{pod="nginx-1", route="/"} -46.00+1.00x40
                                http_requests_total{pod="nginx-2", route="/"}  9+2.00x40
        enginefuzz_test.go:452: count_values without (__name__) ("value", sgn(-ln(idelta({__name__="http_requests_total"}[3m]))))
        enginefuzz_test.go:454: case 53 error mismatch.
            new result: {pod="nginx-1", route="/", value="0"} => 1 @[55000]
            {pod="nginx-2", route="/", value="-1"} => 1 @[55000]
            old result: {pod="nginx-1", route="/", value="-0"} => 1 @[55000]
            {pod="nginx-2", route="/", value="-1"} => 1 @[55000]
        enginefuzz_test.go:459: failed 1 test cases

Sep 19 '24 10:09 GiedriusS

@GiedriusS (regarding FuzzNodesMarshalJSON test fail..)

The problem might be with promqlsmith about how its generating queries based on seed values. Sometimes, the test passes but in certain cases like below:

For instance, on seed value -18 it doesnt generate the offset labels even when promqlsmith.WithEnableOffset(true)

a seed value of -38 gives an invalid syntax

codec_test.go:94: Seed: -38
        codec_test.go:95: Query: (count_values without () ("value", -exp(avg_over_time({name="http_requests_total"}[1m]))) ^ (0.962864270760631
4 <= --{name="http_requests_total"}))

Feb 05 '25 05:02 Saumya40-codes

@GiedriusS (regarding FuzzNodesMarshalJSON test fail..)

The problem might be with promqlsmith about how its generating queries based on seed values. Sometimes, the test passes but in certain cases like below:

For instance, on seed value -18 it doesnt generate the offset labels even when promqlsmith.WithEnableOffset(true)

a seed value of -38 gives an invalid syntax
codec_test.go:94: Seed: -38
        codec_test.go:95: Query: (count_values without () ("value", -exp(avg_over_time({name="http_requests_total"}[1m]))) ^ (0.962864270760631
4 <= --{name="http_requests_total"}))

Sorry I don't see it - isn't that valid Syntax?

Feb 05 '25 13:02 MichaHoffmann

Sorry I don't see it - isn't that valid Syntax?

Of what I searched upon, shouldn't there be any arguments with count_values? Also -- within that expression..

(I'm sorry for that claim if its invalid (as initially query was bit complex for me) and I searched upon the syntax for this conclusion)

Feb 05 '25 14:02 Saumya40-codes

Sorry I don't see it - isn't that valid Syntax?

Of what I searched upon, shouldn't there be any arguments with count_values? Also -- within that expression..

(I'm sorry for that claim if its invalid (as initially query was bit complex for me) and I searched upon the syntax for this conclusion)

Nah that's actually valid PromQL - first argument of count values says which label names values to count, -- is twice unary negation. As far as I can see it's valid PromQL - albeit a bit weird PromQL.

Feb 05 '25 14:02 MichaHoffmann

Nah that's actually valid PromQL - first argument of count values says which label names values to count, -- is twice unary negation. As far as I can see it's valid PromQL - albeit a bit weird PromQL.

Thanks! I'll look further into the test then for this

Feb 05 '25 14:02 Saumya40-codes

Nah that's actually valid PromQL - first argument of count values says which label names values to count, -- is twice unary negation. As far as I can see it's valid PromQL - albeit a bit weird PromQL.

Thanks! I'll look further into the test then for this

Of what I saw, certain seed value is having threshold to after which tests are failing, currently we run the test 100 times at a time. Though most of them passes, some starts failing after 3-4 iterations.

for instance here

codec_test.go:119: Iteration 0 - Depth: 15, Ops: 82
            Query: -(--(-{__name__="http_requests_total"} @ 1677740207.369 offset -3m59s + topk(0.9150317078782106, -(bottomk(0.41893488408915897, -changes({__name__="http_requests_total"}[1m] offset -2m46s)) <= bool -vector(0.8552105534915929)))) and sum(-histogram_fraction((0.7540048990312351 / pi()), 0.1111250993483287, max(quantile((-time() - -time()), stddev({__name__="http_requests_total"} @ end() offset 10s))))))

Maybe can we skip the test after it reaches certain number of depth to decrease such over-complex queries?

Feb 06 '25 07:02 Saumya40-codes