promql-engine icon indicating copy to clipboard operation
promql-engine copied to clipboard

tests: Add fuzz test for native histograms

Open Saumya40-codes opened this issue 9 months ago • 4 comments

Resolves #479

  • This PR adds fuzzing for native histograms
  • Also few aggregates functions (max,min) are now fixed on prometheus side, thus related part of code is uncommented

Saumya40-codes avatar Mar 12 '25 11:03 Saumya40-codes

Err I was on different go version (1.24) locally so didnt saw the same error. Fixed now

Saumya40-codes avatar Mar 14 '25 10:03 Saumya40-codes

Test seems to be failing for few cases locally

enginefuzz_test.go:274: load 2m
                                http_request_duration_seconds{pod="nginx-1"} {{schema:0 count:3 sum:14.00 buckets:[1 2]}}+{{schema:0 count:4 buckets:[1 2 1]}}x20 
                                http_request_duration_seconds{pod="nginx-2"} {{schema:0 count:2 sum:14.00 buckets:[2]}}+{{schema:0 count:6 buckets:[2 2 2]}}x20
    enginefuzz_test.go:276: query: --sum by (pod) ({__name__="http_request_duration_seconds"}), start: 0, end: 0, interval: 0s
    enginefuzz_test.go:282: case 5 error mismatch.
        new result: {pod="nginx-1"} => {count:3, sum:14, (0.5,1]:1, (1,2]:2} @[0]
        {pod="nginx-2"} => {count:2, sum:14, (0.5,1]:2} @[0]
        old result: {pod="nginx-1"} => {count:3, sum:14, (0.5,1]:1, (1,2]:2} @[0]
        {pod="nginx-2"} => {count:2, sum:14, (0.5,1]:2} @[0]
    enginefuzz_test.go:274: load 2m
                                http_request_duration_seconds{pod="nginx-1"} {{schema:0 count:3 sum:14.00 buckets:[1 2]}}+{{schema:0 count:4 buckets:[1 2 1]}}x20 
                                http_request_duration_seconds{pod="nginx-2"} {{schema:0 count:2 sum:14.00 buckets:[2]}}+{{schema:0 count:6 buckets:[2 2 2]}}x20
    enginefuzz_test.go:276: query: stddev_over_time({__name__="http_request_duration_seconds"} @ end()[1h:1m] offset 28s), start: 0, end: 0, interval: 0s
    enginefuzz_test.go:282: case 12 error mismatch.
        new result: {pod="nginx-1"} => 0 @[0]
        {pod="nginx-2"} => 0 @[0]
        old result: 
    enginefuzz_test.go:297: failed 2 test cases

Didn't understand why in the first case, 2nd one has to be with no support for histograms in stddev_over_time . Ill try look into 👀

Saumya40-codes avatar May 14 '25 10:05 Saumya40-codes

enginefuzz_test.go:274: load 2m
                                http_request_duration_seconds{pod="nginx-1"} {{schema:0 count:3 sum:14.00 buckets:[1 2]}}+{{schema:0 count:4 buckets:[1 2 1]}}x20 
                                http_request_duration_seconds{pod="nginx-2"} {{schema:0 count:2 sum:14.00 buckets:[2]}}+{{schema:0 count:6 buckets:[2 2 2]}}x20
    enginefuzz_test.go:276: query: --sum by (pod) ({__name__="http_request_duration_seconds"}), start: 0, end: 0, interval: 0s
    enginefuzz_test.go:282: case 5 error mismatch.
        new result: {pod="nginx-1"} => {count:3, sum:14, (0.5,1]:1, (1,2]:2} @[0]
        {pod="nginx-2"} => {count:2, sum:14, (0.5,1]:2} @[0]
        old result: {pod="nginx-1"} => {count:3, sum:14, (0.5,1]:1, (1,2]:2} @[0]
        {pod="nginx-2"} => {count:2, sum:14, (0.5,1]:2} @[0]

Looks like the problem in the first case is that the compare logic doesn't know about histograms. I can try to push a fix for that today.

harry671003 avatar May 14 '25 14:05 harry671003

Looks like the problem in the first case is that the compare logic doesn't know about histograms. I can try to push a fix for that today.

comparer does have method for the same imo

compareHistograms := func(l, r *histogram.FloatHistogram) bool {
          if l == nil && r == nil {
          return true
          }
          
          if l == nil && r != nil {
          return false
          }
          
          return l.Equals(r)           <- this is method on prometheus side
}

For that particular testcase of what i saw, it looks to be a mismatch in the positive spans,

result: [{0 2}] [{0 3}]

expected: [{0 3}] [{0 2}]

Though they both have the share the same meaning overall ig

Saumya40-codes avatar May 14 '25 15:05 Saumya40-codes

#576 and #577 should fix the failing fuzz tests. Please rebase after they are merged.

harry671003 avatar May 27 '25 16:05 harry671003

Let's merge this since we're skipping the fuzz test anyway. We can enable fuzzing after Prometheus is updated.

cc: @GiedriusS @yeya24

harry671003 avatar May 29 '25 15:05 harry671003

#578 will fix the stats for NH.

harry671003 avatar May 29 '25 17:05 harry671003

Thanks It should be passing all cases now ig. Only falling one remaining is last_over_time

enginefuzz_test.go:274: load 2m
                                http_request_duration_seconds{pod="nginx-1"} {{schema:0 count:3 sum:14.00 buckets:[1 2]}}+{{schema:0 count:83 buckets:[1 2 80]}}x20 
                                http_request_duration_seconds{pod="nginx-2"} {{schema:0 count:2 sum:14.00 buckets:[2]}}+{{schema:0 count:164 buckets:[2 2 160]}}x20
    enginefuzz_test.go:276: query: last_over_time({__name__="http_request_duration_seconds"} @ start()[1h:1m] offset 1m16s), start: 0, end: 99000, interval: 0s
    enginefuzz_test.go:282: case 33 error mismatch.
        new result: {__name__="http_request_duration_seconds", pod="nginx-1"} =>
        {count:3, sum:14, (0.5,1]:1, (1,2]:2} @[0]
        {__name__="http_request_duration_seconds", pod="nginx-2"} =>
        {count:2, sum:14, (0.5,1]:2} @[0]
        old result: {__name__="http_request_duration_seconds", pod="nginx-1"} =>
        0 @[0]
        {__name__="http_request_duration_seconds", pod="nginx-2"} =>
        0 @[0]
    enginefuzz_test.go:297: failed 1 test cases

This looks to be bug upstream where it tries to compare timestamp with float values even when it might not exists (defaults to timestamp being zero)

https://github.com/prometheus/prometheus/blob/main/promql/functions.go#L775C1-L780C3

here if timestamp is -ve and there are not float samples (f.T is 0 (default), and h.T is -ve (original) (see: https://github.com/prometheus/prometheus/blob/main/promql/value.go#L92C1-L96C2)

Saumya40-codes avatar May 30 '25 05:05 Saumya40-codes