Proposal: Add OR operator with parenthesis support for label matchers in PromQL
Motivation
Currently, PromQL has limitations when users need to express complex label matching conditions. While regex patterns can handle simple cases of matching multiple values for a single label (e.g., {status=~"200|204|301"}), they fall short for more complex scenarios involving different labels or combinations of conditions.
Example
Let's say there is a need to query metrics from multiple services that are identified by different labels:
http_requests_total{env="production", cluster="west-1", cell="west-1-1", service="api-v1"}
or http_requests_total{env="production", cluster="west-1", cell="west-1-1", app="api-v2"}
The current PromQL syntax requires writing multiple selectors and joining them using or, which is not very convenient and can lead to verbose queries.
Instead, it could be written more concisely with an OR operator:
http_requests_total{env="production", cluster="west-1", cell="west-1-1", (service="api-v1" or app="api-v2")}
The query translates to postings list math seamlessly, too.
Industry Precedent
It could be argued that problems like these can be resolved by using a combination of relabeling/recording rules. However, metric collection can be widely complex in large systems with many teams and services. Additionally, recording rules put an unnecessary load on the system.
Other competing TSDB solutions have support for similar syntax in their metrics languages:
VictoriaMetrics' MetricsQL
http_requests_total{env="production", cluster="west-1", cell="west-1-1", service="api-v1" or env="production", cluster="west-1", cell="west-1-1", app="api-v2"}
Datadog Query Language
http_requests_total{env:production AND cluster:west-1 AND west-1-1 AND (service:api-v1 or app=api-v2)}
Proposed Syntax
If parentheses are present, groups are delimited by or. Each group maintains its functionality within.
metrics{version="1", service="api" or version="2", app="api"}
Parentheses bound the matchers group:
metrics{(service="api" or app="api"), (version_v1="v1.0.0" or version_v2="1.0")}
There can be multiple levels of parentheses, so more complex matchers could be expressed:
sum(rate({
env="production",
(
(__name__="http_server_requests_total", status="200")
or
({__name__="grpc_server_requests_total", status_code="OK"})
)
)[5m]}
Real-world use cases
1. Legacy System Migration
During modernization efforts, old and new systems often use different labeling conventions:
transaction_count{
(component="billing-legacy", env="prod") or
(service="billing-v2", environment="production") or
(app="billing-modernized", stage="prod")
}
2. Querying metrics scraped with different labels
http_requests_total{env="production", (service="api-v1" or app="api-v2")}
3. Combining metrics with different label schema
sum(rate({
env="production",
(
(__name__="http_server_requests_total", status="200")
or
({__name__="grpc_server_requests_total", status_code="OK"})
)
)[5m]}
Linking a similar issue which also talks about richer label matcher logic: https://github.com/prometheus/prometheus/issues/14824
Hello from the bug-scrub!
We discussed the idea, and I observe that we can separate the question of syntax from the implementation.
I think the current or syntax can express all the examples given, e.g.
sum(rate({
env="production",
(
(__name__="http_server_requests_total", status="200")
or
({__name__="grpc_server_requests_total", status_code="OK"})
)
)[5m]}
could be
sum(rate(
(http_server_requests_total{env="production", status="200"}
or
grpc_server_requests_total{env="production", status_code="OK"}
)[5m]))
(the 2nd one currently is an error because [5m] doesn't work on a binary expression)
Currently "vector selectors" work off a single request to lower-level storage, and introducing or would require that we merge multiple storage requests before carrying on with the computation. Or extend storage requests to take a list of matchers which are or-ed.
Seems like a big project. Next step would be a detailed design.
Can we merge/dupe this with #14824 ? I think it is very similar. Or let's say both ideas needed to be explored in the design doc that we already requested in #14824.