prometheus icon indicating copy to clipboard operation
prometheus copied to clipboard

Promtool unit test rules support values as a range

Open bosschaert opened this issue 5 years ago • 15 comments

Proposal

Promtool test cases don't always provide exact values, hence expected values with an allowable delta should be supported.

Bug Report

I am using the promtool unit testing support to test my alerts and recording rules. However, probably due to inaccuracies in floating point math, the test result of my rule is not precise.

- record: avg_read_latency_5m
  expr: rate(duration_sum[5m])/rate(duration_count[5m])

And then I have the following test

tests:
  - interval: 10s 
    input_series: 
    - series: 'duration_sum'
      values: '0+0.4x120'
    - series: 'duration_count'
      values: '0+1x120'
    promql_expr_test:
    - expr: avg_read_latency_5m
      eval_time: 10m
      exp_samples:
        - labels: 'avg_read_latency_5m'
          value: 0.4

However, the above test fails with:

  FAILED:
    expr:'avg_read_latency_5m', time:10m0s,
        exp:"{__name__=\"avg_read_latency_5m\"} 4E-01, {__name__=\"avg_read_latency_5m\"} 4E-01",
        got:"{__name__=\"avg_read_latency_5m\"} 3.999999999999991E-01, {__name__=\"avg_read_latency_5m\"} 3.999999999999991E-01"

In that case it would be good to be able to, instead of using 0.4 as expected value, be able to specify something like 0.4 ±3% or something like this.

  • System information:

    Darwin 18.2.0 x86_64

  • Prometheus version:

    promtool, version 2.7.1 (branch: HEAD, revision: 62e591f928ddf6b3468308b7ac1de1c63aa7fcf3) build user: root@f9f82868fc43 build date: 20190131-11:16:59 go version: go1.11.5

bosschaert avatar Mar 13 '19 16:03 bosschaert

In https://github.com/prometheus/docs/issues/1305 the potential bug is that the answer is .9991 rather than .999. This behaviour here is as expected.

brian-brazil avatar Mar 13 '19 16:03 brian-brazil

If the behaviour here is as expected how can I write a test that checks for 0.4, if it fails with 0.3999999999999991 ? Isn't this standard floating point inaccuracy?

Other testing frameworks support deltas for situations like this, e.g. Junit: https://junit.org/junit4/javadoc/latest/org/junit/Assert.html#assertEquals(java.lang.String,%20double,%20double,%20double)

bosschaert avatar Mar 13 '19 16:03 bosschaert

You need to use 0.3999999999999991 in that case.

brian-brazil avatar Mar 13 '19 16:03 brian-brazil

Well, 2.4/6 equals to 0.4, so it seems wrong to specify 0.3999999999999991 even though that is in practice an acceptable value...

bosschaert avatar Mar 13 '19 16:03 bosschaert

To be honest, I wouldn't even test this simple a rule. It's more for alerts, and non-trivial PromQL.

brian-brazil avatar Mar 13 '19 17:03 brian-brazil

But if I can't even declare a clear test for such a simple rule, where does that leave me for complex ones?

bosschaert avatar Mar 13 '19 18:03 bosschaert

Quoting https://github.com/prometheus/docs/issues/1305#issuecomment-472073007: as PromQL is meant to be deterministic, if we want to check within a delta, I think it should be internal to promtool and not specified in the yaml.

codesome avatar Mar 14 '19 12:03 codesome

@codesome that would be fine with me too, as long as I can state 0.4 and don't have to state 0.3999999999999991. BTW note that this is just an example, for illustration purposes. It obviously also applies to more complex rules and alerts.

bosschaert avatar Mar 14 '19 12:03 bosschaert

@brian-brazil would check within a delta make sense for unit testing? Or the current way is the way it should be?

codesome avatar Mar 14 '19 13:03 codesome

It wouldn't make sense, as you'd silently be letting changes happen without the tests failing. We depend on this for some of our floating-point accuracy related tests for example.

brian-brazil avatar Mar 14 '19 13:03 brian-brazil

You need to use 0.3999999999999991 in that case.

Another example:

tests:
  - interval: 1m
    input_series:
      - series: 'total'
        values: '0.0+0.01x30 0.3-0.01x30'
    promql_expr_test:
    - expr: total
      eval_time: 20m
      exp_samples:
        - labels: 'total'
          value: 0.20000000000000004

Worse, I think rate is not working correctly on these.

Tested with:

$ promtool --version
promtool, version 2.24.1 (branch: master, revision: e4487274853c587717006eeda8804e597d120340)
  build user:       mockbuild
  build date:       20210127-02:01:21
  go version:       go1.15.6
  platform:         linux/amd64

nemobis avatar Feb 10 '21 21:02 nemobis

Linking this function require.InDelta from the testing library testify, in case it's viable for alternative to require.Equal:

https://github.com/stretchr/testify/blob/ab6dc3262822ed562480c19876b0257ace761e3e/require/require.go#L776-L784

mtfoley avatar Nov 14 '21 13:11 mtfoley

cc @dgl do we want to revisit the decisions here?

roidelapluie avatar Nov 15 '21 08:11 roidelapluie

Was reminded of this as I'm removing the other use of reflect.DeepEqual in #11033 -- I can't think of a reason that people writing unit tests would want or need to worry about floating point accuracy (but clearly we do want to have the core be accurate, so this would be a promtool unit test only thing).

I think it would make sense to accept values via the expression got == want || math.Nextafter(want, math.Inf(-1)) == got || math.Nextafter(want, math.Inf(1)) == got i.e. allow off by one in terms of floating point precision in either direction.

The main downside is around 2^51 this would lose integer precision (i.e. before 2^53 where float64 loses it anyway). I suspect very few people care about numbers that large in unit tests.

dgl avatar Jul 18 '22 12:07 dgl

There are such unittests, Prometheus has several for example to ensure that floating point inaccuracy is minimised.

On Mon 18 Jul 2022, 13:44 David Leadbeater, @.***> wrote:

Was reminded of this as I'm removing the other use of reflect.DeepEqual in #11033 https://github.com/prometheus/prometheus/pull/11033 -- I can't think of a reason that people writing unit tests would want or need to worry about floating point accuracy (but clearly we do want to have the core be accurate, so this would be a promtool unit test only thing).

I think it would make sense to accept values via the expression got == want || math.Nextafter(want, math.Inf(-1)) == got || math.Nextafter(want, math.Inf(1)) == got i.e. allow off by one in terms of floating point precision in either direction.

The main downside is around 2^51 this would lose integer precision (i.e. before 2^53 where float64 loses it anyway). I suspect very few people care about numbers that large in unit tests.

— Reply to this email directly, view it on GitHub https://github.com/prometheus/prometheus/issues/5352#issuecomment-1187319750, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWJG5TPK3TMNCWAAK4JYRDVUVGQ5ANCNFSM4G5WPI2Q . You are receiving this because you were mentioned.Message ID: @.***>

brian-brazil avatar Jul 18 '22 13:07 brian-brazil

I'm running into this issue again, but now because (presumably) the tests were written on an x86 architecture but apparently there is a slight difference in the FP handling on my M1 Mac.

Unit Testing:  test_rules.yaml
  FAILED:
    expr: "gateway_errors", time: 10m,
        exp: {__name__="gateway_errors"} 8.333333333333333E-01
        got: {__name__="gateway_errors"} 8.333333333333334E-01

The promtool testing code looks like this:

      - expr: gateway_errors
        eval_time: 10m
        exp_samples:
          - labels: 'gateway_errors'
            value: 8.333333333333333E-01

How would I write this test to work both on the original arch where the 8.333333333333333E-01 value is produced as well as on my M1 Mac where I apparently get 8.333333333333334E-01 ?

bosschaert avatar Jan 10 '23 17:01 bosschaert

FWIW I've been getting around this wrapping my promql_expr_test expressions in round().

IMHO it feels a bit iffy to be testing exact floating-point equality at all, but hey 🤷

Incidentally, the reason I started using round() was that I was seeing nondeterministic results between subsequent promtool test runs on the same machine with the same inputs. When I get some more time I'll make a minimal repro and file an issue.

sengi avatar Mar 15 '23 15:03 sengi

Hello from the bug scrub.

This sounds after all like a legitimate feature request, so we'll be open to PRs adding the feature.

beorn7 avatar Mar 12 '24 11:03 beorn7