opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

Instrumentation scope attributes cause errors in Prometheus exporter

Open jade-guiton-dd opened this issue 6 months ago • 1 comments
trafficstars

Describe the bug

If the Collector emits two internal metric streams which differ only by their instrumentation scope attributes, the Prometheus exporter from the Go SDK —which is the default way of exposing the Collector's internal metrics— stops working.

Steps to reproduce

Run the core Collector distribution with any version from 0.123.0 to 0.125.0, with the following config file:

receivers:
  nop:
processors:
  batch:
exporters:
  nop:
service:
  pipelines:
    logs:
      receivers: [nop]
      processors: [batch]
      exporters: [nop]
    traces:
      receivers: [nop]
      processors: [batch]
      exporters: [nop]

Additionally, enable the --feature-gates +telemetry.newPipelineTelemetry command-line option.

Then, access http://localhost:8888/metrics in a browser. You should see an error similar to the following:

An error has occurred while serving metrics:

collected metric "otelcol_processor_batch_metadata_cardinality" { label:{name:"processor"  value:"batch"}  [...]  gauge:{value:1}} was collected before with the same name and label values

Disabling the feature gate resolves the error, and shows the usual listing of internal Collector metrics.

Explanation

The batch processor generates a metric named otelcol_processor_batch_metadata_cardinality, with a processor metric attribute containing the component ID. This component ID is the same for the two instances of batch in the above config, which normally causes the metric points generated by both component instances to be aggregated into a single metric stream.

The telemetry.newPipelineTelemetry feature gate injects instrumentation scope attributes in internal metrics based on which component instance emitted the metric. Because the metric points generated by the two batch instances now have differing identifying properties, they are no longer aggregated, and create two different metrics streams. This would normally be a good thing, providing more precise information about the behavior of the two pipelines.

The Prometheus exporter converts OpenTelemetry metric streams into Prometheus time series, then exposes them through a Prometheus server, exposed on port 8888 by default. However, it currently does not support instrumentation scope attributes, and ignores them during the conversion. This leads to the two metric streams being converted to two time series with identical labels, which causes the Prometheus server to error out.

Relevant issues / PRs

In the Collector:

  • PR #12617 introduced the telemetry.newPipelineTelemetry feature gate and the code for injecting component-identifying instrumentation scope attributes.
  • Issue #12870 was caused by the default "off" state of the gate not injecting any component attributes, even in internal logs, where said attributes had been present for a few versions as regular log attributes.
  • PR #12856 tried to solve this issue by stabilizing the feature gate, turning on the attribute injection unconditionally.
  • PR #12917 suggested reverting the previous PR after noticing this issue with the Prometheus exporter.
  • PR #12933 instead set the feature gate back to Alpha (off by default), but restricted the gate to only toggle attribute injection for metrics. This means the exporter works in the default configuration, but may still show this error when the gate is explicitly enabled.

In the OpenTelemetry specification:

  • The Prometheus compatibility specification currently requires OpenTelemetry to Prometheus converters to add the instrumentation scope name and version, but not the attributes, as labels on metric points. Attributes are instead exposed as a separate otel_scope_info metric.
  • spec#4223 (open) proposes to update the Prometheus compatibility specification to add instrumentation scope attributes as labels on metric points instead, to make sure they are treated as identifying properties like in OpenTelemetry, to avoid the aliasing issue at play here.

In the Go SDK:

  • Currently, scope attributes are not exposed in any way.
  • Issue go#5846 (open) tracks the implementation of support for instrumentation scope attributes.
  • PR go#5947 (draft, unmerged) was a PoC for the previous issue, which adds instrumentation scope attributes as labels on the metric points directly, as suggested in spec#4223 above. Based on this PR, it seems that metric streams differing only by their scope schema would encounter a similar issue.

jade-guiton-dd avatar Apr 29 '25 12:04 jade-guiton-dd