feat: otlp allow not translating UTF8 characters
@jmichalek132 as discussed in a few calls this week, let me take over this PR so you can focus on your mentorship project :)
@aknuds1 @ywwg, I'm wondering what we want to accomplish here. There are two things on my mind:
- OTLP users want their metrics as is. No underscores, no units, no nothing, and we want to allow this use case.
- Before 3.0 Prometheus did not accept UTF8 characters by default so we had to translate UTF8 characters to underscores. Now it's a different scenario, UTF8 is accepted by default and we can accept them as they are. (No need to worry about unit suffixes)
Which of those problems are we solving here? Depending on the answer I'm thinking about two different API for the config file:
otlp:
otlp_to_prometheus_translation: NoTranslation (default) | ClassicPrometheusTranslation (UTF8 to underscore + unit suffixes)
otlp:
add_unit_suffixes: false (default) | true #Probably not being worked on in this PR
utf8_to_underscore: false (default) | true
@jmichalek132 as discussed in a few calls this week, let me take over this PR so you can focus on your mentorship project :)
@aknuds1 @ywwg, I'm wondering what we want to accomplish here. There are two things on my mind:
- OTLP users want their metrics as is. No underscores, no units, no nothing, and we want to allow this use case.
- Before 3.0 Prometheus did not accept UTF8 characters by default so we had to translate UTF8 characters to underscores. Now it's a different scenario, UTF8 is accepted by default and we can accept them as they are. (No need to worry about unit suffixes)
Which of those problems are we solving here? Depending on the answer I'm thinking about two different API for the config file:
otlp: otlp_to_prometheus_translation: NoTranslation (default) | ClassicPrometheusTranslation (UTF8 to underscore + unit suffixes)otlp: add_unit_suffixes: false (default) | true #Probably not being worked on in this PR utf8_to_underscore: false (default) | true
For reference linking the consensus from the Prometheus in person dev summit on this topic
https://docs.google.com/document/d/1uurQCi5iVufhYHGlBZ8mJMK_freDFKPG0iYBQqJ9fvA/edit#bookmark=id.x5wujq3bsjjo
CONSENSUS: We recommend keeping the metric names from OTel SDKs as unchanged as possible when converting to Prometheus samples as the default. DISAGREE, but not blocking Fabian Stäber and Julius Volz would like to keep units and _total CONSENSUS: For classic histograms we will still add _bucket, _count and _sum,similar for summaries. CONSENSUS: We intend for OpenMetrics 2.x to support this, targeting 2.0 CONSENSUS: We will have a Prometheus mode which converts . to _ with suffixes and units.
@aknuds1 @ywwg, I'm wondering what we want to accomplish here. There are two things on my mind:
@ArthurSens @jmichalek132 @ywwg given the consensus in the latest dev summit ("We want to explore ways to do type and unit annotations in PromQL"), we've concluded that we need a way to ensure time series are identified through type and unit, right? So we avoid collisions between metrics that are named the same, but have different types/units.
I think the PR needs to define properly which problem is being solved. Jumping into a solution isn't the best approach.
I think the PR needs to define properly which problem is being solved. Jumping into a solution isn't the best approach.
Yes exactly! I was hoping we could come to an agreement before I continue with the PR :)
Yes exactly! I was hoping we could come to an agreement before I continue with the PR :)
Agree, we need to decide on a solution to the problem laid out in OpenTelemetry → Prometheus: type+unit suffixes, right?
The agreed upon solution would have to be well defined, which I think we haven't got to yet.
Agree, we need to decide on a solution to the problem laid out in OpenTelemetry → Prometheus: type+unit suffixes, right?
The agreed upon solution would have to be well defined, which I think we haven't got to yet.
If that's a requirement for the work here, I'm afraid this PR will have to wait until Prometheus 4.0 :/
I don't think we'll manage to implement any of those proposals before 3.0 (scheduled for november this year as far as I could understand). So I was hoping we could find a middle ground: let OTel users have what they want, even if not the best solution.
If we release 3.0 with the current behavior (doing translations), changing the default to no translation would be a breaking change. Could we release 3.0 with no translation, as far as it's well documented in the OTLP guide that name colisions are possible and people should opt-in to ClassicPrometheusTranslation to avoid these situations? Once we have good solution for units/types, we can finally drop the warning in our guides
If the team decides that we really want to wait for proper unit/type handling, I guess I could continue the PR... just changing the default to ClassicPrometheusTranslation
If that's a requirement for the work here, I'm afraid this PR will have to wait until Prometheus 4.0 :/
Could you start a discussion with @gouthamve in CNCF Slack on what his current stance is wrt. dropping OTel metric name translation? I kind of forgot what he last communicated.
My sense from the Otel folks is that the conversion to underscores is a pain point right now, so I'd prefer to find a way to allow the default to be no-translation. If my understanding is correct the issue of same-name-different-type has only been hypothetical so far. I think the benefit to the otel community of having names pass through unchanged outweighs the unproven possibility of collisions.
Perhaps it would be sufficient to add a note in the documentation explaining "by default no translation is done to Otel metric names. If there are multiple metrics with the same name but different types or units, these metrics will collide when queried in Prometheus. In this situation, administrators should set the translation mode to ClassicPrometheusTranslation to avoid the collisions".
As for config choices:
otlp: add_unit_suffixes: false (default) | true #Probably not being worked on in this PR utf8_to_underscore: false (default) | true
if we do this, does add_unit_suffixes always append with an underscore? imho I think it might be fine to use a dot if the user has otherwise not specified underscore conversion, e.g.: "http.status" → "http.status.total". Because of ambiguities like this, I think I'm more in favor of the enum approach:
otlp: otlp_to_prometheus_translation: NoTranslation (default) | ClassicPrometheusTranslation (UTF8 to underscore + unit suffixes)
@ywwg @ArthurSens in the latest dev summit, @dashpole noted the following about @gouthamve's latest thinking:
After the inperson meeting Goutham realized that it’s not as easy as initially thought. We need to account for metric name collision if we don’t handle unit and types somehow.
Doesn't this indicate we need a solution that avoids collisions (as our current solution does, through suffixes)?
After the inperson meeting Goutham realized that it’s not as easy as initially thought. We need to account for metric name collision if we don’t handle unit and types somehow.
Doesn't this indicate we need a solution that avoids collisions (as our current solution does, through suffixes)?
For the perfect scenario, yes! I was just hoping that we could release something that allows no translation before 3.0, even knowing that name collisions is a possibility, while we keep working on a long term solution that will take a few months to release.
Would you be ok if we continue this PR but change the default behavior to keep translating things? NoTranslation would be opt-in, while being very well documented that it can lead to name collision.
For the perfect scenario, yes! I was just hoping that we could release something that allows no translation before 3.0, even knowing that name collisions is a possibility, while we keep working on a long term solution that will take a few months to release.
I'm kind of agnostic, I'm just trying to figure out what is the actual consensus in this sense. I've asked @gouthamve in CNCF Slack. I primarily want to avoid adding changes we end up not wanting. @dashpole's note from the last summit gives the impression we don't want a translation mode that can lead to collisions.
My sense from the Otel folks is that the conversion to underscores is a pain point right now, so I'd prefer to find a way to allow the default to be no-translation. If my understanding is correct the issue of same-name-different-type has only been hypothetical so far. I think the benefit to the otel community of having names pass through unchanged outweighs the unproven possibility of collisions.
Perhaps it would be sufficient to add a note in the documentation explaining "by default no translation is done to Otel metric names. If there are multiple metrics with the same name but different types or units, these metrics will collide when queried in Prometheus. In this situation, administrators should set the translation mode to ClassicPrometheusTranslation to avoid the collisions".
As for config choices:
otlp: add_unit_suffixes: false (default) | true #Probably not being worked on in this PR utf8_to_underscore: false (default) | trueif we do this, does add_unit_suffixes always append with an underscore? imho I think it might be fine to use a dot if the user has otherwise not specified underscore conversion, e.g.: "http.status" → "http.status.total". Because of ambiguities like this, I think I'm more in favor of the enum approach:
otlp: otlp_to_prometheus_translation: NoTranslation (default) | ClassicPrometheusTranslation (UTF8 to underscore + unit suffixes)
I agree with @ywwg suggestions. Also to add my 2 cents,
- It feels like a pity to have all the work done to support UTF-8 in Prometheus and then preventing people from utilizing it by not allowing to turn off the translation.
- The collisions when not adding suffixes can already happen when using remote write exporter in otel collector given it has an option to disable adding the suffixes https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/prometheusremotewriteexporter/README.md but imho this should be very rare.
- Whatever we do here I would prefer if we stick to the same approach in the prometheus exporters in otel collector / otel sdks.
I've updated the PR to not translate UTF8 characters by default. I'm also not touching unit/type suffixes in any way!
I'm happy to change the PR again to allow not adding unit/type suffixes, but I don't want to block the UTF8 translation just because we can't find an agreement on unit/type handling
hi hi, sorry for being late to the party.
Given all the confusion about the conversion spec, and the need for more discussion around it, I think we should keep the default to the translation as it is done today.
I am fine with adding an experimental mode to not translate provided we make it clear that this is experimental and can break in the future. Should we do it via a feature-flag or not needs to be decided.
@aknuds1 @ywwg @npazosmendez this should be ready for a review :)
Note that I'm not changing how we handle unit/type suffixes here, this PR does not introduce any name collision problems that we predicted before. In fact, we avoid name existing name collision where different UTF-8 characters are translated to underscores.
I am fine with adding an experimental mode to not translate provided we make it clear that this is experimental and can break in the future. Should we do it via a feature-flag or not needs to be decided.
context for why I thought we were also going to have a no-translation mode with caveats
Per our discussion, because units are differentiating and prometheus does not support otel unit metadata well, it makes sense for prom's otel endpoint to add suffixes in all available modes. In the future when prom does support unit metadata, we can revisit the enum and add a mode to pass names unaltered.
Folks, I've messed up my commits here 😓. Let's get this one merged first and I'll figure out what to do here later
Lol ok, after merging the chained PR the branch got automatically deleted and messed up this PR. I'll open another one, sorry everyone