client_golang go metrics are tightly coupled with used go version

I've upgraded from Go 1.17.7 to 1.17.8 and I've noticed that exported metrics labels changed as a result. le label on some histogram uses now different values than it used to:

--- a/metrics.txt
+++ b/metrics.txt
@@ -105,13 +105,13 @@ go_gc_heap_tiny_allocs_objects_total
 go_gc_pauses_seconds_total_bucket{le="-5e-324"}
 go_gc_pauses_seconds_total_bucket{le="9.999999999999999e-10"}
 go_gc_pauses_seconds_total_bucket{le="9.999999999999999e-09"}
-go_gc_pauses_seconds_total_bucket{le="1.2799999999999998e-07"}
-go_gc_pauses_seconds_total_bucket{le="1.2799999999999998e-06"}
-go_gc_pauses_seconds_total_bucket{le="1.6383999999999998e-05"}
-go_gc_pauses_seconds_total_bucket{le="0.00016383999999999998"}
-go_gc_pauses_seconds_total_bucket{le="0.0020971519999999997"}
-go_gc_pauses_seconds_total_bucket{le="0.020971519999999997"}
-go_gc_pauses_seconds_total_bucket{le="0.26843545599999996"}
+go_gc_pauses_seconds_total_bucket{le="9.999999999999998e-08"}
+go_gc_pauses_seconds_total_bucket{le="1.0239999999999999e-06"}
+go_gc_pauses_seconds_total_bucket{le="1.0239999999999999e-05"}
+go_gc_pauses_seconds_total_bucket{le="0.00010239999999999998"}
+go_gc_pauses_seconds_total_bucket{le="0.0010485759999999998"}
+go_gc_pauses_seconds_total_bucket{le="0.010485759999999998"}
+go_gc_pauses_seconds_total_bucket{le="0.10485759999999998"}
 go_gc_pauses_seconds_total_bucket{le="+Inf"}
 go_gc_pauses_seconds_total_sum NaN
 go_gc_pauses_seconds_total_count
@@ -240,13 +240,13 @@ go_sched_goroutines_goroutines
 go_sched_latencies_seconds_bucket{le="-5e-324"}
 go_sched_latencies_seconds_bucket{le="9.999999999999999e-10"}
 go_sched_latencies_seconds_bucket{le="9.999999999999999e-09"}
-go_sched_latencies_seconds_bucket{le="1.2799999999999998e-07"}
-go_sched_latencies_seconds_bucket{le="1.2799999999999998e-06"}
-go_sched_latencies_seconds_bucket{le="1.6383999999999998e-05"}
-go_sched_latencies_seconds_bucket{le="0.00016383999999999998"}
-go_sched_latencies_seconds_bucket{le="0.0020971519999999997"}
-go_sched_latencies_seconds_bucket{le="0.020971519999999997"}
-go_sched_latencies_seconds_bucket{le="0.26843545599999996"}
+go_sched_latencies_seconds_bucket{le="9.999999999999998e-08"}
+go_sched_latencies_seconds_bucket{le="1.0239999999999999e-06"}
+go_sched_latencies_seconds_bucket{le="1.0239999999999999e-05"}
+go_sched_latencies_seconds_bucket{le="0.00010239999999999998"}
+go_sched_latencies_seconds_bucket{le="0.0010485759999999998"}
+go_sched_latencies_seconds_bucket{le="0.010485759999999998"}
+go_sched_latencies_seconds_bucket{le="0.10485759999999998"}
 go_sched_latencies_seconds_bucket{le="+Inf"}
 go_sched_latencies_seconds_sum NaN
 go_sched_latencies_seconds_count

It's not a huge problem but it makes it harder to compare metrics between 2 instances of the same service compiled with different Go version. This is likely related to #967

Mar 04 '22 11:03 prymitive

@prymitive Thanks for opening the issue.

The raw metric values of a histogram haven't designed to be compared. Expectation is to use PromQL queries to handle the calculations. The buckets are dynamically generated, and this is expected.

I don't see any action items here, do you have any suggestions?

Mar 17 '22 19:03 kakkoyun

If buckets are dynamically generated then is it possible to get different set of buckets on each scrape? Is this going to introduce metrics churn if there’s enough variance between gc runs?

Mar 17 '22 21:03 prymitive

I think this metric comes from runtime/metrics, and the Float64Histogram type states

For a given metric name, the value of Buckets is guaranteed not to change between calls until program exit

Which I guess implies the buckets can change from instance to instance. Which I guess means it's not actually appropriate to model as a Prometheus histogram, which expects bucket definitions to be more or less static?

Mar 26 '22 20:03 08d2

I think this is indeed problematic. Let's find a solution to this - it would be ideal if we have static buckets incorporated. This can impact cardinality a lot here.

Mar 28 '22 12:03 bwplotka

cc @mknyszek for awareness.

Mar 28 '22 12:03 bwplotka

The buckets changed because I discovered they were very incorrect. I don't expect this to happen often (or again, even). See golang/go#50732.

Mar 28 '22 15:03 mknyszek

But, yes, the API is specified such that the buckets can change. If that doesn't work for Prometheus, then the bucket-choosing algorithm needs to pick its own buckets on the Prometheus side and aggregate the runtime/metrics buckets into them, with some error.

This is somewhat surprising to me because these histograms (at least on the runtime/metrics side) are intended to be used to look at and compare distributions. You can do that even if the buckets are different.

Mar 28 '22 15:03 mknyszek

This is somewhat surprising to me because these histograms (at least on the runtime/metrics side) are intended to be used to look at and compare distributions. You can do that even if the buckets are different.

How can histograms with disparate bucketing be meaningfully aggregated and compared?

Mar 28 '22 16:03 08d2

There are many ways to do this based on how you want things visualized. Fundamentally histograms approximate some underlying distribution, a histogram is just a reasonably efficient way to represent a distribution. So you can extract and look at the distribution instead of the buckets directly.

For example, you could take two histograms with different bucketing, compute a CDF from both, and plot them on the same chart. If you are willing to make some distributional assumptions, such as a normal or geometric distribution, you can also summarize the histogram with percentiles.

In this sense, histograms are somewhat low-level. The Go runtime exposes a time histogram with a pretty fine granularity (which is currently squashed in the Prometheus code to reduce metric cardinality) as a stepping-stone toward producing more robust analyses.

There have been enough issues about this that perhaps this is just the wrong Prometheus data structure to represent the Go runtime histograms. We could use a summary instead, or some other collection of metrics.

Mar 28 '22 17:03 mknyszek

Yea, I think we already extrapolate buckets to have less of them, so the easiest thing is to extrapolate to stable buckets (A). Moving to summaries is not bad either, just have some instrumentation overhead, plus it is harder to use later on (not aggregatable) (B)

We could do A and hope for better histograms in Prometheus (feature that should be there soon)

Apr 12 '22 10:04 bwplotka

Summaries are not aggregatable, indeed, but that might not be so bad in the case of GC (where I assume you are mostly interested in individual processes, not necessarily many in aggregate). But Summaries also have static precalculated quantiles, which you cannot change in retrospect. (E.g. they are always the 99th perc, 90th perc, and median over the last 10m. If you later decide you would like to see the 75th percentile over the last 5m, you are lost.)

In summary (no pun intended), I find Histograms quite interesting here, but they are also very expensive and inaccurate. The new histograms will help with that.

Maybe you can make all of this configurable for now (I guess making these expensive histograms optional was considered anyawy). Or just decide what the least best solution is.

Apr 12 '22 11:04 beorn7

Hello 👋 Looks like there was no activity on this issue for the last 3 months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next 4 weeks, this issue will be closed (we can always reopen an issue if we need!).

Oct 16 '22 05:10 stale[bot]