client_golang histograms: Support exemplars in pure Native Histograms

This depends on https://github.com/prometheus/client_model/issues/61 .

Exemplars can be added to the buckets of conventional histograms but not to native histograms. So far, the way of sending exemplars is by also rendering a conventional histogram. With the issue above fixed, we have a way of sending exemplars independently of conventional buckets. However, without conventional buckets, we don't have a heuristics which exemplars to pick. Native histograms can have so many buckets that adding an examplar to each is infeasible.

This is P2 as explained in the milestone's description.

Sep 06 '22 15:09 beorn7

Idea: keep a schema for the exemplars. At the beginning the exemplar schema would be the same as the buckets schema.

The exemplar schema "buckets" will cover the same range of values as the normal buckets, but at a much lower resolution. We could keep a maximum N exemplars per exemplar schema bucket (M). So max MxN. We'd automatically get selection of sampled exemplars per logarithmic range.

Nov 07 '23 12:11 krajorama

Note that exemplars directly added to native histograms MUST have a timestamp, see https://github.com/prometheus/client_model/issues/61 and https://github.com/prometheus/prometheus/pull/13021 for details.

Nov 09 '23 13:11 beorn7

Since https://github.com/prometheus/client_model/issues/61 is solved, I would love to take this issue. If ok, you can assign to me @beorn7 @carrieedwards

Jan 24 '24 04:01 fatsheep9146

Thank you @fatsheep9146 . I have assigned this issue to you. Feel free to discuss selection heuristics on Slack, if you need inspiration beyond what has been discussed above.

Jan 24 '24 09:01 beorn7

Before I support exemplars in pure native histogram in client golang, should we release the client_model with a new tag v0.6.0 then update the dependency in client golang to use the new client model? @beorn7

Feb 16 '24 15:02 fatsheep9146

True. I'll add the new tag ASAP (hopefully later today).

Feb 16 '24 15:02 beorn7

The dependency in prometheus server should also be updated, I will also do this after the new tag is ready. @beorn7

Feb 16 '24 15:02 fatsheep9146

The tag is in place! https://github.com/prometheus/client_model/releases/tag/v0.6.0

Feb 16 '24 19:02 beorn7

Broad idea for the heuristics:

Let the user configure a number of exemplars n to expose per histogram (default could be around 10).
The first n ObserveWithExemplar calls create an exemplar that we keep for now.
Whenever an ObserveWithExemplar call happens after that, first add the new exemplar to the current set of exemplars, and then delete one according to the following priority list:
1. The oldest exemplar that is older than a certain threshold (maybe 5m, could also be configurable).
2. Find the two exemplars that are closest to each other on an exponential scale. Of those, delete the older one.
A reset of the histogram should also delete all exemplars (so that we never expose an exemplar that doesn't fall into any of the populated buckets).

This should lead to an even spread of exemplars, while not keeping any individual exemplar for overly long.

Feb 21 '24 13:02 beorn7

"Find the two exemplars that are closest to each other on an exponential scale." How should I understand this sentence? Does it mean that I should first take the logarithm of the values in all exemplars, and compare logarithm of newly added exemplar to all of existing exemplars, found the exemplar which has the closest logarithm with newly added exemplar?

For example, there exists two exemplars which has value 4, 6, the histogram schema is 0, the max count of exemplar is 2. When we add a new exemplar which value is 5, we should compare the diff of logarithm with existing exemplar log2(5) - log2(4) = 0.321 log2(6) - log2(5) = 0.263

So we should drop 6, keep 4, 5. Do I understand this correctly?

@beorn7

Feb 23 '24 01:02 fatsheep9146

Yes, sounds about right.

A naive implementation of this could be quite expensive, though, especially if many exemplars are to be kept around. However, for one I assume the number of exemplars will be generally small (around 10?). Also, you can probably play many tricks, like the following:

Keep the logarithm of each exemplar around.
Maintain a list of exemplars sorted by value (so finding the two closest exemplars, it's O(n), not O(n²)).
Maintain a list of exemplars sorted by time (so it's easy to find the oldest exemplar).

Feb 23 '24 11:02 beorn7

client_golang client_golang copied to clipboard

histograms: Support exemplars in pure Native Histograms

client_golang
client_golang copied to clipboard