client_java
client_java copied to clipboard
Implement UTF8 Support
Part of https://github.com/prometheus/prometheus/issues/13095, all client libraries will need to support the new scraping, query, and content negotiation formats.
@fedetorres93
I'll start by implementing UTF-8 support in the Java client library
@fedetorres93 thanks for volunteering, I really appreciate that!
Is there any general guidance yet on how to implement it, for example how to convert UTF-8 names to Prometheus names for older Prometheus servers, and how to deal with potential name collisions when registering metrics?
It would be good to define the behavior first before implementing it. Ideally the behavior would be consistent across client libraries in all programming languages.
@fstab You can find the proposals @ywwg worked on here and here.
I'm working on adding UTF-8 metric and label name validations and support for parsing and formatting the UTF-8 text format, but there's still some discussion going on about the content negotiation implementation on writes and also regarding how the reads will be handled
Thanks @fedetorres93!
There is already support for dots in metric and label names in client_java. It will be easy to extend this to other characters. The motivation for allowing dots was to support metric/label names defined in the OpenTelemetry semantic conventions.
Currently dots are only exposed in OpenTelemetry format. In Prometheus text format, OpenMetrics text format, and OpenMetrics protobuf format dots are replaced with underscores.
I assume for UTF-8 characters in Prometheus format we will define a new OpenMetrics version, right?
I think the following two considerations make sense:
- When converting OpenTelemetry names to Prometheus names follow the rules defined here: https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/. These are the rules that are also implemented in the OpenTelemetry collector. For a user it should not matter whether they scrape Prometheus format, or whether they push OpenTelemetry format and have a collector convert to Prometheus remote write. The resulting metric and attribute names should be the same, therefore Prometheus client libraries should implement the OpenTelemetry standard for converting arbitrary names to Prometheus names.
- Prometheus client libraries have a "fail fast" approach: When you register metrics with conflicting names, registration fails. We don't defer these errors to scrape time. I think we should look at the classic Prometheus names when checking for conflicts, i.e. we should fail if a user registers a metric named
requests.totaland then registers a metric namedrequests_total. While this might theoretically work when exposing new names only, it will fail at scrape time for older Prometheus servers. We should consider this bad practice and prevent this in our client libraries.
What do you think? If you feel we should have a small "client library support for UTF-8" proposal with the points above I'm happy to write one.
Thanks for the info @fstab!
I don't think another proposal is necessary, but I appreciate the points you mentioned and will take them into account for the implementation.