go icon indicating copy to clipboard operation
go copied to clipboard

x/telemetry: add support for histogram counters and charts

Open findleyr opened this issue 2 months ago • 1 comments

The telemetry documentation at https://go.dev/doc/telemetry#basic-counters describes how, by convention, telemetry counters can represent a histogram. For example, we currently record latency in gopls using predefined bucket names like "<10ms", "<50ms", "<1s", but in the telemetry infrastructure these are treated as unstructured strings. On the other hand, https://research.swtch.com/telemetry-design#configuration suggests histograms as a predefined chart type.

We've decided that histograms should be supported natively as both counter and chart types. Specifically, histogram counters should record scalar observations, bucketing should be handled by the counter package, and the telemetry chart rendering should be knowledgeable of units.

Here's a hypothetical new API:

package counter

// NewHistogram creates a new histogram counter for a scalar type.
//
// Bucket names are computed in terms of the provided unit.
// For example, given a unit time.Millisecond and buckets
// []time.Duration{1*time.Millisecond, 10*time.Millisecond, 100*time.Millisecond},
// the bucket names will be {0,1,10,100}.
//
// The provided unit is also used as the basis for the default bucketing.
func NewHistogram[T Integer| Float](name string, unit T) *Histogram[T]

type Histogram[T Integer | Float] struct{ /* ... */}

// SetBuckets configures the the histogram to bucket observations into the following
// half-open intervals:
//  - [0, bounds[0])
//  - [bounds[i], bounds[i+1]), for each i < len(bounds) - 1
//  - [bounds[len(bounds-1)], ∞)
//
// SetBuckets must be called before the first call to [Record].
func (h *Histogram[T]) SetBuckets(bounds []T)

// Record increments the bucket counter corresponding to the observed value.
// The given value must be non-negative.
func (h *Histogram[T]) Record(value T)

While I'm including a SetBuckets method for completeness, I'm not up to date on the latest theory for bucketing. My hope is that the default bucketing should do the right thing in most cases.

This would be used like so:

completionLatency := counter.NewHistogram("gopls/completion/latencyms", 1*time.Millisecond)
completionLatency.Record(42*time.Millisecond)

// ...

cacheHitRatio := counter.NewHistogram("build/cache/hitpercent", 0.01)
cacheHitRatio.Record(0.12)

In the telemetry chart configuration, we'd support the histogram chart type, and collected buckets would have to be explicitly enumerated, as in https://research.swtch.com/telemetry-design#configuration. I'd also suggest including a "unit" field in the chart configuration.

For example

counter: gopls/completion/latencyms:{0,10,20,50,100,200,500,1000,10000}
title: Completion Latency
description: Measured latency of successful completion requests.
type: histogram
unit: milliseconds
program: golang.org/x/tools/gopls

Open Questions

  • How can we keep buckets consistent between instrumented counter and chart?
  • What should we use as the default bucketing?
  • Do we need to support floating point units, or can we restrict to integral units?
  • Should we avoid generics, and just provide constructors for the common types of histogram we expect (latency, ratio, etc.)
  • Relatedly, should the chartconfig unit field be an enum of supported types, so that for example 1000ms can be simplified as 1s?

CC @golang/telemetry @rsc @adonovan

findleyr avatar Jun 26 '24 22:06 findleyr