opentelemetry-collector
opentelemetry-collector copied to clipboard
Collector metrics names reported with OpenCensus and OpenTelemetry are inconsistent
Currently Collector uses OpenCensus SDK to report its own metrics. Migration to OpenTelemetry metrics SDK started with a feature gate telemetry.useOtelForInternalMetrics. There is a significant difference in the produced metrics.
OpenCensus metrics utilize the prometheus exporter with the default metrics namespace set to otelcol. It means that all the reported metrics get the otelcol prefix, and we have metrics reported as otelcol_exporter_enqueue_failed_log_records, otelcol_process_uptime, etc.
OpenTelemetry metrics SDK doesn't have a capability to specify the namespace on all the metrics. So the metrics would be exporter_enqueue_failed_log_records, process_uptime, etc.
Before proceeding with the migration process, we need to decide what metric names would look like long term. In particular, do we want to keep otelcol prefix or remove it?
-
If we want to keep the prefix, we should probably stop using the namespace configuration on the OpenCensus Prometheus exporter and make sure all the metrics have the
otelcolprefix set explicitly. This should be a no-op change for end users of the default collector's metrics. -
If we want to remove the prefix, it will be a significant breaking change for current OTel Collector users. We should probably run it through a deprecation process using another feature gate with an extended (up to 1 year?) duration.
The "short_name" attribute is intended to be used for the prometheus namespace, but is currently a work-in-progress:
- https://github.com/open-telemetry/opentelemetry-specification/pull/2702
- https://github.com/open-telemetry/opentelemetry-specification/pull/2703
Also, I'm working on a migration OTel for OC -> OTel to help mitigate or prevent breaking changes when migrating:
- https://github.com/open-telemetry/oteps/pull/210
As discussed during a SIG meeting today, we will be able to move otelcol prefix from OC Prometheus exporter namespace to the Scope short_name attribute which is WIP ^. Prometheus exporter will be translating Scope short_name attribute to a prefix, so we will get consistent metrics across OC and OTel SDKs
@dashpole given that we currently expose Collector's metrics in Prometheus format only, do you think it makes sense to set otelcol as prefix on metrics instead of the namespace for now until we have an option to use Scope short_name? At least it'll bring the exposed metrics to a consistent state from Prometheus standpoint and will allow us to migrate to Scope short_name without breaking users with telemetry.useOtelForInternalMetrics enabled.
The OC metrics are not at all present anyway when using telemetry.useOtelForInternalMetrics, since the bridge which is the most important part is not available.
I'm fine prefixing everything we use the OTel API for with otel for now. I'm not sure its worth the effort, but if people actually want to use otel instrumentation right now, we can definitely do that. I 100% agree with not breaking existing telemetry :)
Just one small clarification:
Currently Collector uses OpenCensus SDK to report its own metrics. Migration to OpenTelemetry metrics SDK started with a feature gate telemetry.useOtelForInternalMetrics. There is a significant difference in the produced metrics.
Currently all these metrics are "not available" (not different) if telemetry.useOtelForInternalMetrics enabled. Do we care about the prefix fix right now or wait until the bridge is implemented?
I'd prefer waiting. I don't see the urgency in changing the OTel metric names while the OTel metric SDK is being rewritten.
Closed by #6223 since now all metrics with prometheus otel have also the prefix.