No translation mode in Prometheus->OTLP conversion
What are you trying to achieve?
This proposal aims to enhance the Prometheus to OTLP by introducing a configurable “no translation” mode. This mode will allow to keep the job and instance labels instead of dropping them after conversion into service.name, service.namespace, and service.instance.id.
Motivation
This is a problem for Prometheus-compatible datastores receiving data in OTLP format. The datastores need to perform an extra step to convert service.name and service.instance.id to job and instance labels for existing PromQL queries to work.
For example. Grafana Mirmir also performs an additional step to convert “service.name” and ”service.instance.id” back to “job” and “instance.”
Preserving original labels allows to:
- maintain compatibility with existing PromQL queries and dashboards,
- avoid extra translation steps (e.g., converting
service.nameback to job) - Allow Prometheus users to leverage the power of the OTel collector, without forcing a schema migration to semantic conventions
Additional context.
-
if the message arrives and the exporter has both job/instance labels and the
service.*attributes, what do we do then?- In this case, since the receiver keeps the job and instance labels on the top of converting them to service.* attributes, then converting them back SHOULD match the original job and instance values.
- The exporter SHOULD work with existing implementation.
I am opening this issue after discussion with @ArthurSens on https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/42425.
Alternatives
We can alternatively add another option to "preserve labels" on the top of translating it.
There shouldn't be any changes required for prometheus based exporters, as they will convert resource metrics into target_info series and promote resource attributes to labels. In this case, they should be able to convert job/instance attributes to job/instance labels without any issues
Tip: React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
This is a problem for Prometheus-compatible datastores receiving data in OTLP format. The datastores need to perform an extra step to convert service.name and service.instance.id to job and instance labels for existing PromQL queries to work.
Why is this a problem? It's a part of the standard OTLP to Prometheus conversion, and that's not going to change any time soon.
maintain compatibility with existing PromQL queries and dashboards,
Can you please elaborate on why compatibility is not currently maintained, without your proposed mode?
@aknuds1 has thought about this as well I think. My 2c: We should probably always preserve all original opentelemetry resource attributes. Adding job and instance should be the optional part, and should probably be duplicated by default to avoid breaking existing users.
cc @open-telemetry/prometheus-interoperability for additional thoughts or ideas
@aknuds1 Let me rephrase and give you an example.
Consider the following setup:
Prometheus remote_write -> OTel Collector -> remote_write datastore
In above setups, the OTel Collector currently converts prometheus labels to OTeL semantic conventions and then back again. This translation roundtrip is unnecessary and adds overhead.
A "no translation" mode would optimize this flow by allowing the collector to process, filter, and enrich metrics while keeping the original promtheus labels intact.
This also makes the collector a true processing gateway, ideal for users coming from the prometheus ecosystem who prefer to work with familiar labels without enforced semantic conversions. It is bit awkward to deal with semconvs if both the input and output are not semconv.
Prometheus remote_write -> OTel Collector -> remote_write datastore
Is remote_write datastore here an OTLP server?
In above setups, the OTel Collector currently converts prometheus labels to OTeL semantic conventions and then back again
This sounds weird to me, I can't really make sense of it. Do you mean that regular Prometheus metric labels are converted to OTel metric attributes and target_info metric labels are converted to OTel resource attributes?
A "no translation" mode would optimize this flow by allowing the collector to process, filter, and enrich metrics while keeping the original promtheus labels intact.
What would the benefits be if the OTel Collector were to have this "no translation" mode? Would it just be an optimization?
I can say that the drawback on the OTLP backend side (e.g. Prometheus) would be that if you don't have service.instance.id or service.name resource attributes in OTLP payloads, you can't generate target_info (or store resource attributes if the OTLP backend supports it), so it's not a functionally equivalent mode. You lose some of the conversion.
This also makes the collector a true processing gateway, ideal for users coming from the prometheus ecosystem who prefer to work with familiar labels without enforced semantic conversions.
Can you please exemplify what the difference would be to users? I don't understand what you mean by enforced semantic conventions.
@aknuds1
Is remote_write datastore here an OTLP server?
No, it’s a remote_write endpoint. The collector uses the remote_write exporter to send data to it.
This sounds weird to me, I can't really make sense of it. Do you mean that regular Prometheus metric labels are converted to OTel metric attributes and target_info metric labels are converted to OTel resource attributes?
Exactly. That’s what I meant.
What would the benefits be if the OTel Collector were to have this "no translation" mode? Would it just be an optimization?
Partly that, yes. But it also helps users who are more familiar with Prometheus conventions.
For example, the Prometheus remote_write receiver currently produces an OTLP document like this:
{
"name": "http_requests_total",
"attributes": {
"service.name": "frontend",
"service.instance.id": "10.0.0.5:9090",
"method": "GET"
},
"value": 1023
}
Here, the job and instance labels are translated into their service.* equivalents, and the original labels are discarded.
If we provided an option to skip this translation, users will continue working with the prometheus-style attributes they’re already familiar with. This would make it easier for teams already accustomed to Prometheus to adopt the OpenTelemetry Collector.
Is the
remote_writedatastore here an OTLP server?
Having a “no translation” mode would definitely help if it is an OTLP server as well. For example, Grafana Mimir adds resource attributes to the target_info metric. However, when it comes to the service.* attributes, Mimir converts them back into their original prometheus-style job and instance labels.
No, it’s a remote_write endpoint. The collector uses the remote_write exporter to send data to it.
So the remote_write endpoint would receive Prometheus metrics? Can you clarify the concrete benefits to the user from your proposed "no translation" mode in the Prometheus remote_write -> OTel Collector -> remote_write datastore scenario?
If we provided an option to skip this translation, users will continue working with the prometheus-style attributes they’re already familiar with. This would make it easier for teams already accustomed to Prometheus to adopt the OpenTelemetry Collector.
Users don't work with OTLP documents, can you instead explain what the difference would be for users? I assume the users would be using Prometheus metrics converted from OTLP.
Having a “no translation” mode would definitely help if it is an OTLP server as well. For example, Grafana Mimir adds resource attributes to the target_info metric. However, when it comes to the service.* attributes, Mimir converts them back into their original prometheus-style job and instance labels.
Can you concretize the benefit of your proposal in this case? Mimir and Prometheus has an option to preserve service.instance.id, service.namespace and service.name in target_info, they are not necessarily dropped. I don't understand what you think should be done differently.