client_java icon indicating copy to clipboard operation
client_java copied to clipboard

Invalid unicode output

Open jonatan-ivanov opened this issue 5 months ago • 10 comments

If I do this:

PrometheusRegistry registry = new PrometheusRegistry();
Counter.builder()
    .name("test_我喜欢茶")
    .labelNames("test")
    .register(registry)
    .labelValues("test_我喜欢茶").inc();
String accept = OpenMetricsTextFormatWriter.CONTENT_TYPE + "; escaping=allow-utf-8";
ExpositionFormats.init().findWriter(accept).write(System.out, registry.scrape(), EscapingScheme.fromAcceptHeader(accept));

I get this:

# TYPE "test_我喜欢茶" counter
{"test_我喜欢茶_total",test="test_我喜欢茶"} 1.0
# EOF

If I remove the non-ascii characters (still UTF-8 allowed), I get the usual output:

# TYPE test counter
test_total{test="test"} 1.0
# EOF

Is this expected?

jonatan-ivanov avatar Sep 19 '25 02:09 jonatan-ivanov

I don't understand - what did you expect to be different?

zeitlinger avatar Sep 19 '25 10:09 zeitlinger

I would expect this:

test_我喜欢茶_total{test="test_我喜欢茶"} 1.0

instead of

{"test_我喜欢茶_total",test="test_我喜欢茶"} 1.0

See the position of {} and the extra "".

jonatan-ivanov avatar Sep 19 '25 16:09 jonatan-ivanov

This is required for unicode: https://prometheus.io/docs/guides/utf8/#querying

zeitlinger avatar Sep 23 '25 08:09 zeitlinger

The part you linked is about querying, not about the output format. Does OpenMetrics 1.0 support this?

jonatan-ivanov avatar Sep 23 '25 17:09 jonatan-ivanov

Can't find it in the spec - asked here: https://cloud-native.slack.com/archives/CC6CPDEJV/p1758709641526229

Either the spec has not been updated - or I looked at the wrong place

zeitlinger avatar Sep 24 '25 10:09 zeitlinger

The spec is released, earlier versions will not be updated, there can be a OpenMetrics 1.1 that supports this but I'm not sure 1.0 does.

jonatan-ivanov avatar Sep 24 '25 17:09 jonatan-ivanov

Yes, I can confirm that we (prometheus maintainers) failed to update the spec when this feature was introduced.

Quoting @ywwg from the thread:

During development we updated the exposition formats and parsers for both prometheus format and OpenMetrics -- I remember discussing the question of whether it required a new version and at the time I was told it was fine to add it without bumping the versions. It seems that the consensus has landed elsewhere, but currently the code does work So we are going to invent OM 1.1 that adds escaping for utf-8 which is to say, invent the standard. And then in the code we will fail exposition for UTF-8 if the negotiated version is <1.1

This refers to the development of the go version, which has been used as a basis for the java version

zeitlinger avatar Sep 25 '25 08:09 zeitlinger

So the code above that should use OM 1.0 should not do what it does right?

jonatan-ivanov avatar Sep 25 '25 18:09 jonatan-ivanov

It is correct - but not OM 1.0.

It'll probably end up as being OM 1.1 or something.

To make that clear, I'll add an "experimental" note to the Unicode docs.

zeitlinger avatar Sep 25 '25 19:09 zeitlinger

test_我喜欢茶_total{test="test_我喜欢茶"} 1.0

this syntax is not possible due to some UTF-8 characters being reserved for prometheus syntax, especially dots. See: https://github.com/prometheus/proposals/blob/main/proposals/0028-utf8.md

ywwg avatar Sep 25 '25 19:09 ywwg