prometheus
prometheus copied to clipboard
Promtool unit test rules support values as a range
Proposal
Promtool test cases don't always provide exact values, hence expected values with an allowable delta should be supported.
Bug Report
I am using the promtool unit testing support to test my alerts and recording rules. However, probably due to inaccuracies in floating point math, the test result of my rule is not precise.
- record: avg_read_latency_5m
expr: rate(duration_sum[5m])/rate(duration_count[5m])
And then I have the following test
tests:
- interval: 10s
input_series:
- series: 'duration_sum'
values: '0+0.4x120'
- series: 'duration_count'
values: '0+1x120'
promql_expr_test:
- expr: avg_read_latency_5m
eval_time: 10m
exp_samples:
- labels: 'avg_read_latency_5m'
value: 0.4
However, the above test fails with:
FAILED:
expr:'avg_read_latency_5m', time:10m0s,
exp:"{__name__=\"avg_read_latency_5m\"} 4E-01, {__name__=\"avg_read_latency_5m\"} 4E-01",
got:"{__name__=\"avg_read_latency_5m\"} 3.999999999999991E-01, {__name__=\"avg_read_latency_5m\"} 3.999999999999991E-01"
In that case it would be good to be able to, instead of using 0.4 as expected value, be able to specify something like 0.4 ±3%
or something like this.
-
System information:
Darwin 18.2.0 x86_64
-
Prometheus version:
promtool, version 2.7.1 (branch: HEAD, revision: 62e591f928ddf6b3468308b7ac1de1c63aa7fcf3) build user: root@f9f82868fc43 build date: 20190131-11:16:59 go version: go1.11.5
In https://github.com/prometheus/docs/issues/1305 the potential bug is that the answer is .9991 rather than .999. This behaviour here is as expected.
If the behaviour here is as expected how can I write a test that checks for 0.4, if it fails with 0.3999999999999991 ? Isn't this standard floating point inaccuracy?
Other testing frameworks support deltas for situations like this, e.g. Junit: https://junit.org/junit4/javadoc/latest/org/junit/Assert.html#assertEquals(java.lang.String,%20double,%20double,%20double)
You need to use 0.3999999999999991 in that case.
Well, 2.4/6
equals to 0.4
, so it seems wrong to specify 0.3999999999999991
even though that is in practice an acceptable value...
To be honest, I wouldn't even test this simple a rule. It's more for alerts, and non-trivial PromQL.
But if I can't even declare a clear test for such a simple rule, where does that leave me for complex ones?
Quoting https://github.com/prometheus/docs/issues/1305#issuecomment-472073007: as PromQL is meant to be deterministic, if we want to check within a delta, I think it should be internal to promtool and not specified in the yaml.
@codesome that would be fine with me too, as long as I can state 0.4
and don't have to state 0.3999999999999991
.
BTW note that this is just an example, for illustration purposes. It obviously also applies to more complex rules and alerts.
@brian-brazil would check within a delta make sense for unit testing? Or the current way is the way it should be?
It wouldn't make sense, as you'd silently be letting changes happen without the tests failing. We depend on this for some of our floating-point accuracy related tests for example.
You need to use 0.3999999999999991 in that case.
Another example:
tests:
- interval: 1m
input_series:
- series: 'total'
values: '0.0+0.01x30 0.3-0.01x30'
promql_expr_test:
- expr: total
eval_time: 20m
exp_samples:
- labels: 'total'
value: 0.20000000000000004
Worse, I think rate
is not working correctly on these.
Tested with:
$ promtool --version
promtool, version 2.24.1 (branch: master, revision: e4487274853c587717006eeda8804e597d120340)
build user: mockbuild
build date: 20210127-02:01:21
go version: go1.15.6
platform: linux/amd64
Linking this function require.InDelta
from the testing library testify
, in case it's viable for alternative to require.Equal:
https://github.com/stretchr/testify/blob/ab6dc3262822ed562480c19876b0257ace761e3e/require/require.go#L776-L784
cc @dgl do we want to revisit the decisions here?
Was reminded of this as I'm removing the other use of reflect.DeepEqual
in #11033 -- I can't think of a reason that people writing unit tests would want or need to worry about floating point accuracy (but clearly we do want to have the core be accurate, so this would be a promtool unit test only thing).
I think it would make sense to accept values via the expression got == want || math.Nextafter(want, math.Inf(-1)) == got || math.Nextafter(want, math.Inf(1)) == got
i.e. allow off by one in terms of floating point precision in either direction.
The main downside is around 2^51 this would lose integer precision (i.e. before 2^53 where float64 loses it anyway). I suspect very few people care about numbers that large in unit tests.
There are such unittests, Prometheus has several for example to ensure that floating point inaccuracy is minimised.
On Mon 18 Jul 2022, 13:44 David Leadbeater, @.***> wrote:
Was reminded of this as I'm removing the other use of reflect.DeepEqual in #11033 https://github.com/prometheus/prometheus/pull/11033 -- I can't think of a reason that people writing unit tests would want or need to worry about floating point accuracy (but clearly we do want to have the core be accurate, so this would be a promtool unit test only thing).
I think it would make sense to accept values via the expression got == want || math.Nextafter(want, math.Inf(-1)) == got || math.Nextafter(want, math.Inf(1)) == got i.e. allow off by one in terms of floating point precision in either direction.
The main downside is around 2^51 this would lose integer precision (i.e. before 2^53 where float64 loses it anyway). I suspect very few people care about numbers that large in unit tests.
— Reply to this email directly, view it on GitHub https://github.com/prometheus/prometheus/issues/5352#issuecomment-1187319750, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWJG5TPK3TMNCWAAK4JYRDVUVGQ5ANCNFSM4G5WPI2Q . You are receiving this because you were mentioned.Message ID: @.***>
I'm running into this issue again, but now because (presumably) the tests were written on an x86 architecture but apparently there is a slight difference in the FP handling on my M1 Mac.
Unit Testing: test_rules.yaml
FAILED:
expr: "gateway_errors", time: 10m,
exp: {__name__="gateway_errors"} 8.333333333333333E-01
got: {__name__="gateway_errors"} 8.333333333333334E-01
The promtool testing code looks like this:
- expr: gateway_errors
eval_time: 10m
exp_samples:
- labels: 'gateway_errors'
value: 8.333333333333333E-01
How would I write this test to work both on the original arch where the 8.333333333333333E-01
value is produced as well as on my M1 Mac where I apparently get 8.333333333333334E-01
?
FWIW I've been getting around this wrapping my promql_expr_test
expressions in round()
.
IMHO it feels a bit iffy to be testing exact floating-point equality at all, but hey 🤷
Incidentally, the reason I started using round()
was that I was seeing nondeterministic results between subsequent promtool
test runs on the same machine with the same inputs. When I get some more time I'll make a minimal repro and file an issue.
Hello from the bug scrub.
This sounds after all like a legitimate feature request, so we'll be open to PRs adding the feature.