prometheus
prometheus copied to clipboard
Flaky TestAsyncRuleEvaluation
Failed here: https://github.com/prometheus/prometheus/actions/runs/7797756253/job/21265037848?pr=13545
=== RUN TestAsyncRuleEvaluation/asynchronous_evaluation_of_all_independent_rules,_insufficient_concurrency
manager_test.go:1961:
Error Trace: D:/a/prometheus/prometheus/rules/manager_test.go:1961
Error: "0.0655291" is not less than "0.06"
Test: TestAsyncRuleEvaluation/asynchronous_evaluation_of_all_independent_rules,_insufficient_concurrency
--- FAIL: TestAsyncRuleEvaluation (0.18s)
--- FAIL: TestAsyncRuleEvaluation/asynchronous_evaluation_of_all_independent_rules,_insufficient_concurrency (0.07s)
Passed on re-run: https://github.com/prometheus/prometheus/actions/runs/7797756253/job/21265977651?pr=13545
Also failed here: https://github.com/prometheus/prometheus/actions/runs/7725797136/job/21060788152?pr=13197
I think creating more rules for evaluation in fixtures/rules_multiple_independent.yaml might fix the flakeyness, since the time saved with concurrent evaluations would become more apparent. wdyt?
I haven't looked at the test at all. Since it's only flaking on Windows, could it be the timer resolution again?
TIL about windows timer resolution being ~15ms!
My hypothesis is that the concurrency does not get to "kick in" with just 6 rules, and so if we increase the number of rules to evaluate, the time savings due to concurrency will get more marked.
check https://github.com/prometheus/prometheus/pull/13553
I just got this on my desktop Linux VM:
$ go test -tags stringlabels ./rules/
--- FAIL: TestAsyncRuleEvaluation/asynchronous_evaluation_with_independent_and_dependent_rules (0.06s)
manager_test.go:2391:
Error Trace: /home/vagrant/src/github.com/grafana/mimir-prometheus/rules/manager_test.go:2391
Error: "0.056689165" is not less than "0.04"
Test: TestAsyncRuleEvaluation/asynchronous_evaluation_with_independent_and_dependent_rules
Huh, that test is flakey too? I think the same technique will help there as well. Updated
Flaked on https://github.com/prometheus/prometheus/actions/runs/8080900355/job/22078272296?pr=13662 (windows)
=== RUN TestAsyncRuleEvaluation/asynchronous_evaluation_with_independent_and_dependent_rules
manager_test.go:1926:
Error Trace: D:/a/prometheus/prometheus/rules/manager_test.go:1926
Error: "0.0411435" is not less than "0.04"
Test: TestAsyncRuleEvaluation/asynchronous_evaluation_with_independent_and_dependent_rules
and
https://github.com/prometheus/prometheus/actions/runs/8153560332/job/22285206086?pr=13706 (windows)
=== RUN TestAsyncRuleEvaluation/asynchronous_evaluation_of_all_independent_rules,_sufficient_concurrency
manager_test.go:2000:
Error Trace: D:/a/prometheus/prometheus/rules/manager_test.go:2000
Error: "0.06309" is not less than "0.06"
Test: TestAsyncRuleEvaluation/asynchronous_evaluation_of_all_independent_rules,_sufficient_concurrency
Looking at the code again, it seems to rely on a 10ms Sleep being reasonably close to 10ms.
https://github.com/prometheus/prometheus/blob/3b8b57700c469c7cde84e1d8f9d383cb8fe11ab0/rules/manager_test.go#L2075
https://github.com/prometheus/prometheus/blob/3b8b57700c469c7cde84e1d8f9d383cb8fe11ab0/rules/manager_test.go#L1926
This assumption is not safe: https://github.com/golang/go/issues/44343 (but will be safer in Go 1.23).
The timings in flake reports above don't match neatly to the 15ms value on Windows, so I guess something else is going wrong.
Another example: https://github.com/prometheus/prometheus/actions/runs/9076113775/job/24938133905?pr=14098#step:4:13802
=== RUN TestAsyncRuleEvaluation/asynchronous_evaluation_of_all_independent_rules,_insufficient_concurrency
manager_test.go:1962:
Error Trace: D:/a/prometheus/prometheus/rules/manager_test.go:1962
Error: "0.07915" is not less than "0.06"
Test: TestAsyncRuleEvaluation/asynchronous_evaluation_of_all_independent_rules,_insufficient_concurrency
Unfortunately we're still seeing this on main
=== RUN TestAsyncRuleEvaluation/asynchronous_evaluation_of_all_independent_rules,_insufficient_concurrency
manager_test.go:1963:
Error Trace: D:/a/prometheus/prometheus/rules/manager_test.go:1963
Error: "0.8259941" is not less than "0.09"
Test: TestAsyncRuleEvaluation/asynchronous_evaluation_of_all_independent_rules,_insufficient_concurrency
=== RUN TestAsyncRuleEvaluation/asynchronous_evaluation_of_all_independent_rules,_sufficient_concurrency
--- FAIL: TestAsyncRuleEvaluation (1.01s)
https://github.com/prometheus/prometheus/actions/runs/9178564530/job/25238754426#step:4:14073
https://github.com/prometheus/prometheus/pull/14300 was merged for this.