opentelemetry-specification icon indicating copy to clipboard operation
opentelemetry-specification copied to clipboard

No translation mode in Prometheus->OTLP conversion

Open VihasMakwana opened this issue 3 months ago • 6 comments

What are you trying to achieve?

This proposal aims to enhance the Prometheus to OTLP by introducing a configurable “no translation” mode. This mode will allow to keep the job and instance labels instead of dropping them after conversion into service.name, service.namespace, and service.instance.id.

Motivation

This is a problem for Prometheus-compatible datastores receiving data in OTLP format. The datastores need to perform an extra step to convert service.name and service.instance.id to job and instance labels for existing PromQL queries to work.

For example. Grafana Mirmir also performs an additional step to convert “service.name” and ”service.instance.id” back to “job” and “instance.”

Preserving original labels allows to:

  • maintain compatibility with existing PromQL queries and dashboards,
  • avoid extra translation steps (e.g., converting service.name back to job)
  • Allow Prometheus users to leverage the power of the OTel collector, without forcing a schema migration to semantic conventions

Additional context.

  • if the message arrives and the exporter has both job/instance labels and the service.* attributes, what do we do then?
    • In this case, since the receiver keeps the job and instance labels on the top of converting them to service.* attributes, then converting them back SHOULD match the original job and instance values.
    • The exporter SHOULD work with existing implementation.

I am opening this issue after discussion with @ArthurSens on https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/42425.

Alternatives

We can alternatively add another option to "preserve labels" on the top of translating it. There shouldn't be any changes required for prometheus based exporters, as they will convert resource metrics into target_info series and promote resource attributes to labels. In this case, they should be able to convert job/instance attributes to job/instance labels without any issues

Tip: React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

VihasMakwana avatar Oct 17 '25 06:10 VihasMakwana

This is a problem for Prometheus-compatible datastores receiving data in OTLP format. The datastores need to perform an extra step to convert service.name and service.instance.id to job and instance labels for existing PromQL queries to work.

Why is this a problem? It's a part of the standard OTLP to Prometheus conversion, and that's not going to change any time soon.

maintain compatibility with existing PromQL queries and dashboards,

Can you please elaborate on why compatibility is not currently maintained, without your proposed mode?

aknuds1 avatar Oct 25 '25 08:10 aknuds1

@aknuds1 has thought about this as well I think. My 2c: We should probably always preserve all original opentelemetry resource attributes. Adding job and instance should be the optional part, and should probably be duplicated by default to avoid breaking existing users.

cc @open-telemetry/prometheus-interoperability for additional thoughts or ideas

dashpole avatar Oct 28 '25 17:10 dashpole

@aknuds1 Let me rephrase and give you an example.

Consider the following setup:

Prometheus remote_write -> OTel Collector -> remote_write datastore

In above setups, the OTel Collector currently converts prometheus labels to OTeL semantic conventions and then back again. This translation roundtrip is unnecessary and adds overhead.

A "no translation" mode would optimize this flow by allowing the collector to process, filter, and enrich metrics while keeping the original promtheus labels intact.

This also makes the collector a true processing gateway, ideal for users coming from the prometheus ecosystem who prefer to work with familiar labels without enforced semantic conversions. It is bit awkward to deal with semconvs if both the input and output are not semconv.

VihasMakwana avatar Nov 05 '25 13:11 VihasMakwana

Prometheus remote_write -> OTel Collector -> remote_write datastore

Is remote_write datastore here an OTLP server?

In above setups, the OTel Collector currently converts prometheus labels to OTeL semantic conventions and then back again

This sounds weird to me, I can't really make sense of it. Do you mean that regular Prometheus metric labels are converted to OTel metric attributes and target_info metric labels are converted to OTel resource attributes?

A "no translation" mode would optimize this flow by allowing the collector to process, filter, and enrich metrics while keeping the original promtheus labels intact.

What would the benefits be if the OTel Collector were to have this "no translation" mode? Would it just be an optimization?

I can say that the drawback on the OTLP backend side (e.g. Prometheus) would be that if you don't have service.instance.id or service.name resource attributes in OTLP payloads, you can't generate target_info (or store resource attributes if the OTLP backend supports it), so it's not a functionally equivalent mode. You lose some of the conversion.

This also makes the collector a true processing gateway, ideal for users coming from the prometheus ecosystem who prefer to work with familiar labels without enforced semantic conversions.

Can you please exemplify what the difference would be to users? I don't understand what you mean by enforced semantic conventions.

aknuds1 avatar Nov 05 '25 14:11 aknuds1

@aknuds1

Is remote_write datastore here an OTLP server?

No, it’s a remote_write endpoint. The collector uses the remote_write exporter to send data to it.

This sounds weird to me, I can't really make sense of it. Do you mean that regular Prometheus metric labels are converted to OTel metric attributes and target_info metric labels are converted to OTel resource attributes?

Exactly. That’s what I meant.

What would the benefits be if the OTel Collector were to have this "no translation" mode? Would it just be an optimization?

Partly that, yes. But it also helps users who are more familiar with Prometheus conventions.

For example, the Prometheus remote_write receiver currently produces an OTLP document like this:

{
  "name": "http_requests_total",
  "attributes": {
    "service.name": "frontend",
    "service.instance.id": "10.0.0.5:9090",
    "method": "GET"
  },
  "value": 1023
}

Here, the job and instance labels are translated into their service.* equivalents, and the original labels are discarded.

If we provided an option to skip this translation, users will continue working with the prometheus-style attributes they’re already familiar with. This would make it easier for teams already accustomed to Prometheus to adopt the OpenTelemetry Collector.


Is the remote_write datastore here an OTLP server?

Having a “no translation” mode would definitely help if it is an OTLP server as well. For example, Grafana Mimir adds resource attributes to the target_info metric. However, when it comes to the service.* attributes, Mimir converts them back into their original prometheus-style job and instance labels.

VihasMakwana avatar Nov 06 '25 12:11 VihasMakwana

No, it’s a remote_write endpoint. The collector uses the remote_write exporter to send data to it.

So the remote_write endpoint would receive Prometheus metrics? Can you clarify the concrete benefits to the user from your proposed "no translation" mode in the Prometheus remote_write -> OTel Collector -> remote_write datastore scenario?

If we provided an option to skip this translation, users will continue working with the prometheus-style attributes they’re already familiar with. This would make it easier for teams already accustomed to Prometheus to adopt the OpenTelemetry Collector.

Users don't work with OTLP documents, can you instead explain what the difference would be for users? I assume the users would be using Prometheus metrics converted from OTLP.

Having a “no translation” mode would definitely help if it is an OTLP server as well. For example, Grafana Mimir adds resource attributes to the target_info metric. However, when it comes to the service.* attributes, Mimir converts them back into their original prometheus-style job and instance labels.

Can you concretize the benefit of your proposal in this case? Mimir and Prometheus has an option to preserve service.instance.id, service.namespace and service.name in target_info, they are not necessarily dropped. I don't understand what you think should be done differently.

aknuds1 avatar Nov 06 '25 12:11 aknuds1