micrometer icon indicating copy to clipboard operation
micrometer copied to clipboard

Support min/max in OTLP histograms

Open shakuzen opened this issue 3 years ago • 4 comments
trafficstars

We do not currently ship min/max (though we do ship an infinite bucket which is the max) in OTLP histograms. At this time, we cannot add them because we are waiting for a release of opentelemetry-proto (see https://github.com/open-telemetry/opentelemetry-proto/issues/387). When we do add them, we will need to figure out the right semantics of what min and max will mean. See https://github.com/micrometer-metrics/micrometer/pull/3129#discussion_r850688499_ for some initial thoughts.

Originally posted by @twicksell in https://github.com/micrometer-metrics/micrometer/pull/3129#discussion_r850688499

shakuzen avatar Apr 28 '22 06:04 shakuzen

This should be unblocked now with this release: https://github.com/open-telemetry/opentelemetry-proto-java/releases/tag/v0.17.0

shakuzen avatar May 12 '22 04:05 shakuzen

This should be unblocked now with this release: https://github.com/open-telemetry/opentelemetry-proto-java/releases/tag/v0.17.0

Hi @shakuzen , I agree. I just recently took interest in OpenTelemetry, and the way I see it min/max support in histograms would be a valuable addition.

sspieker avatar Jun 23 '22 09:06 sspieker

The open question is what will the min/max mean. We currently only support cumulative temporality (https://github.com/micrometer-metrics/micrometer/issues/3145 to support delta), which means min/max should be the min/max of all values observed in the interval, which will be since the meter was first registered (generally about when the application started). For long-running applications, this generally becomes less useful over time. Max values in Micrometer in other registries use a time-windowed maximum for this reason.

The OTLP spec on histograms mentions

The max of all values in the histogram.

The aggregation temporality also has implications on the min and max fields. Min and max are more useful for Delta temporality, since the values represented by Cumulative min and max will stabilize as more events are recorded. Additionally, it is possible to convert min and max from Delta to Cumulative, but not from Cumulative to Delta. When converting from Cumulative to Delta, min and max can be dropped, or captured in an alternative representation such as a gauge.

I think from a usefulness perspective, when using cumulative temporality, we should publish our time-windowed max as a separate gauge and not set max on the histogram. If/when we support delta temporality, we could set max on the histogram, but our time-windowed max intentionally does not match the same interval as other metrics published in the step (see https://github.com/micrometer-metrics/micrometer/issues/1993).

That at least gives us an approach to publish a max now with cumulative temporality. I think we will need to figure out how to set the start/stop times accurately for the time-window represented in the publish, though.

Would having the max published as a separate gauge work for your needs?

If anyone is interested in contributing the max as a separate gauge as described above, feel free to send a pull request or let us know any questions you have. Or if anyone has feedback on the proposal or other ideas, please do share that.

We don't track minimum values now, and to solve it for the OTLP registry, we may want to add something more generically to Micrometer for that. How important is the min for users?

shakuzen avatar Jun 24 '22 07:06 shakuzen

You're absolutely right, the things you mentioned are important and should be discussed before proposing a possible solution.

In my opinion, having a max histogram value only makes sense for Delta temporality. The possibility to "convert" Delta max to Cumulative but not the other way around (as mentioned in the spec) is a very strong argument.

Personally, I'd rather not have the max published as a separate gauge, regardless if it's Cumulative or Delta based. Seems to be a workaround that is going to be fixed sooner than later, and "You're able to use max now in this way, but you need to change it soon..." is not something I'd like to tell our (metric) customers at all.

To sum it up: configurable aggregation temporality (#3145) might be the requirement that should be implemented first as it seems to be a pre-condition for this issue (min/max). The ability to configure Delta temporality takes the burden of decision making off your shoulders and puts it on the micrometer-registry-otlp.

Oh and btw, I have no use-case for min right now...

sspieker avatar Jun 24 '22 11:06 sspieker