PromQL: Dot in regexp should also match newlines (which is less surprising to most users and also allows optimising the `.* `regexp)
What did you do?
I found that wildcard label matcher such as app=~".*" impact on query performance significantly.
Then i tried to edit benchmark BenchmarkPostingsForMatchers as followed,

The results is amazing!!!
The time consumption with wildcard is worser 100K times than without wildcard.
BenchmarkPostingsForMatchers/Head/n="1",i=~".*",j="foo"-16 5 213092008 ns/op
BenchmarkPostingsForMatchers/Head/n="1",j="foo"-16 593829 1838 ns/op
What did you expect to see?
Label matcher {n="1",i=~".*",j="foo} is equal to label matcher {n="1"j="foo}, so the evaluation cost should be similar.
Thank you.
This is equivalent in TSDB but not in PromQL, so the optimization could be done in tsdb. I think it would be great.
Waiting for a tsdb maintainer to set P3 on this if they also find this useful.
in promql:
- we need to send the matcher to remote reads
-
absent(foo{i=~".*",i="a"})returns{} 1andabsent(foo{i="a"})returns{i="a"} 1
Yeah it would be nice to have. We could also do it for .+.
Didn't @pracucci made such optimizations already?
I have found cef4dd6fff06198c61007211be460fb83cd708c1 and 2f6bf7de4c41015a9adeeb3e544766d61ec865eb.
That was for individual regex matchers
May i pick up this issue?
I wonder that maybe i can take part into prometheus contribution.
This is equivalent in TSDB but not in PromQL, so the optimization could be done in tsdb. I think it would be great.
I think we tried to do this optimisation in the past, but have been told by @brian-brazil that it's not safe to just skip it because of label values with newlines.
Yes, this is not a safe optimisation.
I remember now indeed :)
It seems that we should use !="" rather use .* or .+ in the most situations now.
It seems that we should use
!=""rather use.*or.+in the most situations now.
If you're currently using app=~".*" it also matches metrics without the app label at all. On the contrary, if you use app!="" it will not match metrics without the app label. The two label selectors are not the same thing.
Hrm, we've discussed this before and even sent a PR for this: https://github.com/prometheus/prometheus/pull/6996
This is not a safe optimisation with the current model, but something to consider for Prometheus 3.0. Will add this to the dev summit agenda and report back once we discuss this further!
my 2 cents on this.
There's a LOT of wasted CPU for this because a lot of people uses Grafana variables and when they want to match all they use .* or even .+ I get the point of compatibility, but I'm sure no one is relying on .* to match all but newlines.
I'm sure someone is using label values with newlines, does he expect to not matches his label when he does .* ? I'd be surprised.
Honestly if we could count for how much money wasted in total across all companies in the world this does, I'm sure you'll change your mind.
It seems that grafana could change the "default auto" to be (?s:.*) and (?s:.+), then we can optimize for these in tsdb.
Another idea would be to change the prometheus default anchor behind a feature flag and optimize for (?s:.*) and (?s:.+) in tsdb as well, but the user would still type =~".*".
I'm really wondering how many Prometheus users are aware that .* or .+ doesn't match if the label value contains a newline. I believe the expected behaviour is that it would match "whatever" the label value is. I'm wondering if what we keep considering it a breaking change (a change to have label values with newlines matching .*) may actually be a bug fix for the final Prometheus user, and so we could treat it as such :)
It seems that grafana could change the "default auto" to be
(?s:.*)and(?s:.+), then we can optimize for these in tsdb.
For sure
As I said, I am open to change the default anchoring based on a feature flag, and optimize anyway in TSDB for the correct anchoring.
I think this is still worth tackling. @roidelapluie which way do we prefer more:
- A feature flag to match
.*and.+and handle them differently when matching postings? - A feature flag to change default anchor. For this option just want to double check, now we are using
^(?:)$. Do we change this to(?s:)and no need to keep^$?
I feel option 1 is easier.
But only option 2 is correct.
But only option 2 is correct.
Yes. Option 1 changes the behavior but it is something acceptable depending on the usecase. If no usecase to have new line in values then option 1 should be sufficient?
I think ignoring newlines matches the expectation of most users of Prometheus.
The plan was always correctness, so I don't see an issue with option 2. What would be wrong with it?
At some point i was playing around with the FastRegexMatcher to optimize those regex: https://github.com/alanprot/prometheus/commit/bc168724872dddee212415502d26da72401528d7#diff-c6bfdf5ae2a5edbc3df844f48f297e4d2a4f4106662f1a04c8931fbb38f1fdae
Idk if this is something we wanna pursue.
Hello from the Bug Scrub!
I think ignoring newlines matches the expectation of most users of Prometheus.
Sounds like @gouthamve comment is still relevant. Prometheus 3.0 coordinators, was this issue discussed already? We updated title as well.
It feels 9X.X% of users would likely be OK with this optimization, so let's decide what we want for Prometheus 3.0 (personally I think we should break it explicitly)
Note the related #8525.
@marioferh is working on this
Yeah it would be nice to have. We could also do it for
.+.
I've been trying to optimize also .+ but not sure how to do it. Any ideas?