opentelemetry-specification Reconsider modifying metric name with OTLP unit

The prometheus conversion currently requires metric names to be modified with the OTLP unit

Additionally, the unit MUST be added as a suffix to the metric name, and SHOULD be converted to base units recommended by OpenMetrics when possible

I think this should be reconsidered, I suspect users don't expect their metric names to not actually be the names, and e.g. when configuring grafana http_server_duration seems expected vs http_server_duration_ms. Adding UNIT to the help text makes sense though.

Apr 18 '22 23:04 anuraaga

@open-telemetry/wg-prometheus

Apr 18 '22 23:04 anuraaga

The prometheus metric would ideally be named http_server_duration_seconds, which is close to the expected naming for that metric in the prometheus ecosystem (http_request_duration_seconds, which is used in the prometheus metric naming examples).

We could consider making it a "MUST, by default", rather than just a "MUST", to allow users the option of excluding the unit. According to OpenMetrics, if we add a unit comment (which we really should IMO), we MUST add the unit as the metric suffix as well.

Apr 19 '22 00:04 dashpole

Thanks I was looking for the text in OpenMetrics but couldn't find it. It looks like it should be added then, wonder if OTel's default of ms vs seconds will cause issues for users though

Apr 19 '22 02:04 anuraaga

Yeah, a metric ending in _ms would be strange to prometheus users. Even spelling out _milliseconds would be better than _ms. Ideally, exporters would convert metrics in ms to seconds in their Prometheus exporter. That is what I intended by SHOULD be converted to base units recommended by OpenMetrics in the spec. Do you think that is feasible?

Apr 19 '22 14:04 dashpole

Do you think that is feasible?

We'd have to maintain a table of origin units, target units, and conversion functions. Could probably have a smallish library of common situations that would cover the majority of use cases. Ideally this would be spec'd.

Histograms are a bit tricky. Suppose you have an instrument aggregated as explicit bucket with the default bucket boundaries 0, 5, 10, 25, 50, 75, 100, 250, 500, 1000. Data was originally recorded in milliseconds and we're trying to convert it to seconds. One problem is that the bucket boundaries were chosen for measurements in milliseconds. Do you divide all the bucket boundaries by 1000? I.e. 0, .005, .01, .025, .05, .075, .1 .25, .5, 1? The alternative is to keep the boundaries the same but shuffle around the bucket counts. However, this isn't possible without access to the original measurements. E.g. values > 1000 could be mapped into any of the lower buckets, or even stay in the > 1000 bucket.

Apr 19 '22 14:04 jack-berg

Scaling all sum + bucket boundaries seems like the best in the histogram case. The only case where I don't think it works correctly is for exponential histograms. IIUC, bucket boundaries are powers of 2, so we can change the scale by 1024 (2^10), but not exactly 1000.

Apr 19 '22 14:04 dashpole

It's not clear to me how exponential histograms would be converted to prometheus in general. Are exponential histograms allowed to be exported via prometheus? If yes, would the exponential bucket boundaries just be computed and converted explicit buckets? If yes, can we just scale the boundaries ignoring the fact that they were originally an exponential scale?

Apr 19 '22 15:04 jack-berg

One of the prometheus maintainers is working on support for exponential histograms in prometheus, but i'm not sure what the plan is or how far along they are. For now, the Prometheus <-> OTLP spec says to drop exponential histogram points, so you are correct that it isn't an issue right now.

Apr 19 '22 15:04 dashpole

Hi, we expect the metrics to come with an unit: http_server_duration_seconds for example. This is what someone using Prometheus is accustomed to today and we even have a linter to make sure: curl --silent [http://example.com:9090/metrics/] | promtool check metrics

Its a bit unfortunate that OTel is using milliseconds and Prometheus is using seconds, and one of the main reasons we decided to mandate base units is to avoid conversions when doing math.

With these two cases above, I think it makes more sense to include the unit as it will become very obvious what it is when writing the query and obvious if metrics with two different units are being used together.

Nov 22 '22 16:11 gouthamve

Regarding exponential histograms, I think we'll have to just live with the inconsistency between milliseconds and seconds.

Nov 22 '22 16:11 gouthamve

I've written a proposal related to this, which proposes not adding unit to the metric name when translating from OTel to prometheus: https://docs.google.com/document/d/16Wo-QHZLcKO0uFx97HPUJ-ZMFXSPviQ7Oc4-bShgcto/edit?usp=sharing

Feedback is welcome. It would be especially helpful if you have any feedback from users one way or another.

I'm hoping to discuss it at the next Prometheus dev summit on 2/22.

Feb 08 '24 16:02 dashpole