opentelemetry-go
opentelemetry-go copied to clipboard
[prometheusexporter] SDK uses the wrong signal to determine whether to escape
Description
PR 5755 added switching logic to determine when to escape prometheus names and when not to. We have since decided that relying on model.UTF8Validation is the wrong signal and we should only use an internal flag to gate that behavior. That value has also been deprecated. And, since there are bugs with how the escaping is currently performed, I suggest the SDK should always escape until we have update the spec to allow for the proper no-translation mode (https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/39706#issuecomment-2840485382).
Removing this behavior will restore the SDK to its expected behavior, which is to always escape prometheus metric and label names.
FYI @dmathieu, @dashpole
I suggest the SDK should always escape until we have update the spec to allow for the proper no-translation mode
Can you please give more context as personally I am getting lost in the communication.
Is there a spec issue for it? Can you reference it?
When I am reading https://pkg.go.dev/github.com/prometheus/common/model#NameValidationScheme:
If you wish to stick to the legacy name validation use IsValidLegacyMetricName()
andLabelName.IsValidLegacy()methods instead. This variable is here as an escape hatch for emergency cases, given the recent change fromLegacyValidationtoUTF8Validation, e.g., to delay UTF-8 migrations in time or aid in debugging unforeseen results of the change. In such a case, a temporary assignment toLegacyValidationvalue in theinit()` function in your main.go or so, could be considered.
then it looks like the "no-translation mode" is the officially supported one it maybe we should not escape at all.
since there are bugs with how the escaping is currently performed, I suggest the SDK should always escape until we have update the spec to allow for the proper no-translation mode
What bugs? Is there an issue? If the escaping is buggy then it does not seem to make sense to always escape.
Can you please improve the issue description as guess that you are suggesting to do right thing, but I have a feeling that the justification is unclear. Even if I would somehow follow it, I think that it should be well described for the sake of the users that would like to understand why we are going back and forth.
Possibly related: https://github.com/open-telemetry/opentelemetry-specification/issues/4494#issuecomment-2881606881
I think this bug could be best summarized as "update otel-go SDK to follow the updated otel spec for prometheus conversion" -- I could just close this in favor of: https://github.com/open-telemetry/opentelemetry-go/issues/6668
What bugs? Is there an issue?
the main bug right now is that when otel-go receives "escaping=allow-utf-8", it will still translate metric names to prometheus-compatible but will leave label names untranslated native UTF-8. This has caused some confusion and bug reports which I am unfortunately unable to locate at the moment.
Instead, as per the updated specs, "escaping=allow-utf-8" in Content Negotiation should not by itself trigger any change in output from the endpoint -- the choice about whether to send untranslated UTF-8 or translated prometheus names is up to the configuration on the endpoint's side. (In the opposite case, when "escaping=allow-utf-8" is not present or a different escaping is requested, then the endpoint should do that escaping so that it does not send UTF-8).