client_java icon indicating copy to clipboard operation
client_java copied to clipboard

UTF-8 support in metric and label names

Open fedetorres93 opened this issue 1 year ago • 4 comments
trafficstars

Adds UTF-8 support for metric and label names.

These changes are based on the work done on the Prometheus common libraries here and here

  • The prometheus-metrics-exposition-formats module will use the new quoting syntax {"foo"} iff the metric does not conform to the legacy name format (foo{})
  • The prometheus-metrics-model has a new flag (NameValidationScheme) which determines if validation is done using the legacy or the UTF-8 scheme
  • Scrapers can announce via content negotiation that they support UTF-8 names by adding escaping=allow-utf-8 in the Accept header. In cases where UTF-8 is not available, metric providers can be configured to escape names in a few different ways: values (U__ UTF value escaping for perfect round-tripping), underscores (all invalid chars become _), dots (dots become _dot_, _ becomes __, all other values become ___). Escaping can either be a global default (PrometheusNaming.nameEscapingScheme) or can also be specified in Accept header with the escaping= term, which can be allow-utf-8 (for UTF-8-compatible), underscores, dots, or values. This should still be a noop for existing configurations because scrapers will not be passing the escaping key in the Accept header. Existing functionality is maintained.

Work towards https://github.com/prometheus/common/issues/527

fedetorres93 avatar Feb 07 '24 19:02 fedetorres93

Hello @fedetorres93, thanks for working on this. I noticed this is still a draft, but I have a few general comments, unrelated to specific code lines:

  • I think we need a new exposition format for this. If a Prometheus scrape requests application/openmetrics-text; version=1.0.0; then we should return a response that's valid OpenMetrics 1.0.0. Otherwise we will break all scrapes by existing Prometheus servers.
  • The Prometheus Java client is not only used as a standalone library. It is also used for the Prometheus exporter of the OpenTelemetry Java SDK. Therefore, conversion of OpenTelemetry names to Prometheus names must comply with the Prometheus and OpenMetrics Compatibility Spec. There are many components in the OpenTelemetry ecosystem for converting OTel metrics to Prometheus metrics, and we should try to keep names consistent so that users don't get different names depending on how they set up their pipeline. I think it would be good to reach out to OpenTelemetry's Prometheus WG first and define consistent name conversion. The Prometheus WG already started a design doc.

fstab avatar Feb 08 '24 11:02 fstab

  • then we should return a response that's valid OpenMetrics 1.0.0. Otherwise we will break all scrapes by existing Prometheus servers.

Part of the content negotiation is a new term, escaping=. If the escaping term is absent or anything other than allow-utf-8, then all responses will be converted to valid legacy prometheus format and will not use the new syntax. See https://github.com/prometheus/common/pull/570

ywwg avatar Feb 08 '24 16:02 ywwg

Hello @fstab, I see that there's still some discussion going on about OpenTelemetry metric names conversion, but I think that the PR is ready for an initial review, at least until a consensus is reached.

fedetorres93 avatar Mar 07 '24 20:03 fedetorres93

Hey @fstab, I just wanted to check in regarding this PR. I understand there might still be ongoing talks about the OpenTelemetry metric names conversion, but I was wondering if you had a chance to take an initial look.

Looking forward to any feedback you might have.

fedetorres93 avatar Mar 27 '24 20:03 fedetorres93