prometheusremotewrite: End-to-end Prometheus Native Histogram Support

Open Reimirno opened this issue 1 year ago • 0 comments

TLDR

Telegraf prometheusremotewrite data format parser Prometheus native histogram into one single Telegraf metric (instead of multiple Telegraf metrics), and its serializer should be able to serialize it back to a Prometheus native histogram. Design at the end.

Use Case

We are on a Prometheus stack and is planning to use Telegraf on data ingestion path (for some aggregation). This is a simplified view of our design.

Pods ---(get scraped)---> Agent (Prometheus Agent/Grafana Agent) ---(remote write)---> Telegraf ---(remote write)---> TSDB ....

Some of our metrics are native histogram, a new histogram model introduced by Prometheus. Rather than getting emitted as several metrics (_sum, _count, many _bucket with les) it encodes a protobuf struct and emits a single time series. It not only guarantees atomicity and thus resolves the writing batch problem that's present in classic histogram but also offers better resolution, query accuracy at a lower cost.

I PoC-ed a simple Telegraf ingest and output (aggregation logic not added yet) and put it in out ingestion path. Native histogram metrics are only available in protobuf exposition format - so prometheusremotewrite data format seems the right choice. Important configs are:

[[inputs.http_listener_v2]]
      alias = "prom-ingest"
      service_address = ":9201"
      paths = ["/receive"]
      methods = ["GET", "OPTIONS", "POST", "PUT"]
      data_format = "prometheusremotewrite"

[[outputs.http]]
      alias = "prom-write"
      url = "%(write_url)s"
      timeout = "10s"
      data_format = "prometheusremotewrite"

      [outputs.http.headers]
         Content-Type = "application/x-protobuf"
         Content-Encoding = "snappy"
         X-Prometheus-Remote-Write-Version = "2.0.0"

Expected behavior

http_listener_v2 prometheusremotewrite format: a native histogram should be ingested and parsed into one single Telegraf metric, not breaking its atomicity.
http output prometheusremotewrite format: a native histogram should be written out, if a native histogram is ingested.

How exactly a native histogram metric should be parsed into one single Telegraf metric (data representation) is worth a design, so that it is:

not difficult for writing aggregators (starlark etc) for it
(even better) amenable to existing processors
(even better) reusable logic to openmetrics exponential histogram support

Actual behavior

Currently, support for ingesting native histogram is implemented in this PR: https://github.com/influxdata/telegraf/pull/14952 This causes the parser to break down a native histogram into many Telegraf metrics (_sum _count and many _bucket), as if it is a classic histogram. When getting written out by http output, it serializes into several separate Prometheus metrics, instead of one native histogram. This means all the benefits from native histogram (atomicity, reduced cardinality, better performance) are lost.

Additional info

Proposal: We need to change how prometheusremotewrite parser handles a prom native histogram. It should parse it into one single Telegraf metric. We need to change how prometheusremotewrite serializer so that it converts back such an Telegraf metrics to a prom native histogram.

A high-level design:

Oct 31 '24 21:10 Reimirno