Invalid unicode output
If I do this:
PrometheusRegistry registry = new PrometheusRegistry();
Counter.builder()
.name("test_我喜欢茶")
.labelNames("test")
.register(registry)
.labelValues("test_我喜欢茶").inc();
String accept = OpenMetricsTextFormatWriter.CONTENT_TYPE + "; escaping=allow-utf-8";
ExpositionFormats.init().findWriter(accept).write(System.out, registry.scrape(), EscapingScheme.fromAcceptHeader(accept));
I get this:
# TYPE "test_我喜欢茶" counter
{"test_我喜欢茶_total",test="test_我喜欢茶"} 1.0
# EOF
If I remove the non-ascii characters (still UTF-8 allowed), I get the usual output:
# TYPE test counter
test_total{test="test"} 1.0
# EOF
Is this expected?
I don't understand - what did you expect to be different?
I would expect this:
test_我喜欢茶_total{test="test_我喜欢茶"} 1.0
instead of
{"test_我喜欢茶_total",test="test_我喜欢茶"} 1.0
See the position of {} and the extra "".
This is required for unicode: https://prometheus.io/docs/guides/utf8/#querying
The part you linked is about querying, not about the output format. Does OpenMetrics 1.0 support this?
Can't find it in the spec - asked here: https://cloud-native.slack.com/archives/CC6CPDEJV/p1758709641526229
Either the spec has not been updated - or I looked at the wrong place
The spec is released, earlier versions will not be updated, there can be a OpenMetrics 1.1 that supports this but I'm not sure 1.0 does.
Yes, I can confirm that we (prometheus maintainers) failed to update the spec when this feature was introduced.
Quoting @ywwg from the thread:
During development we updated the exposition formats and parsers for both prometheus format and OpenMetrics -- I remember discussing the question of whether it required a new version and at the time I was told it was fine to add it without bumping the versions. It seems that the consensus has landed elsewhere, but currently the code does work So we are going to invent OM 1.1 that adds escaping for utf-8 which is to say, invent the standard. And then in the code we will fail exposition for UTF-8 if the negotiated version is <1.1
This refers to the development of the go version, which has been used as a basis for the java version
So the code above that should use OM 1.0 should not do what it does right?
It is correct - but not OM 1.0.
It'll probably end up as being OM 1.1 or something.
To make that clear, I'll add an "experimental" note to the Unicode docs.
test_我喜欢茶_total{test="test_我喜欢茶"} 1.0
this syntax is not possible due to some UTF-8 characters being reserved for prometheus syntax, especially dots. See: https://github.com/prometheus/proposals/blob/main/proposals/0028-utf8.md