telegraf icon indicating copy to clipboard operation
telegraf copied to clipboard

feat(parser.prometheusremotewrite): Add dense metric version to better support histograms

Open Reimirno opened this issue 11 months ago • 8 comments

Summary

Split v1/v2 metric parser version for prometheusremotewrite format, emulating that in prometheus format.

V2 is the original, existing behavior.Code is directly lifted from the original plugins/parsers/prometheusremotewrite/parser.go.

This PR implements V1, which emulates the dense behavior of prometheus format v1. It:

  1. converts prom metric name to measurement name
  2. uses a generic key value for value field
  3. converts a prom native histogram into one single telegraf metric, instead of multiple (as in V2)

Note that prometheusremotewrite V1 DOES NOT convert classic histogram/summary into one single telegraf metric as in the case of prometheus V1, because in a remote write request, a classic histogram is already broken up into various sub-metrics (counters/gauges), and they could even be in different remote write batches. However, it does conserves the dense behavior for native histogram.

--

This PR partially inplemented the design in #16120 and partially supercedes PR #16121, as a follow up to resolve this comment and design discussion in slack channel with @srebhan.

A PR for prometheusremotewrite serializer can be found at #16494

Checklist

  • [x] No AI generated code was used in this PR

Related issues

resolves #16120

Reimirno avatar Feb 09 '25 05:02 Reimirno

Thanks for the helpful comments! I will iterate on them.

Reimirno avatar Feb 23 '25 04:02 Reimirno

I think I addressed all the comments. Feel free to reopen if I left out anything.

Reimirno avatar Feb 24 '25 06:02 Reimirno

Some comments on the folder-based test cases:

  • input file format: Prom RW with native histogram needs to be in protobuf format and I don't find an easy and readable way to provide that kind of input as a file for the test input to consume. (I went down the prototext rabbit hole only to find there're some version mismatch of the proto unmarshall librarys - remember prometheus is using quite a unique proto library) So I just opted for json files for input and parse json into a prometheus remote write request struct. (This is also why I end up not using "github.com/influxdata/telegraf/testutil/plugin_input"
  • As for output, it could be json or influx lines or whatever. It doesn't matter. Used json just for consistency.

Reimirno avatar Feb 24 '25 06:02 Reimirno

Some comments on the folder-based test cases:

  • input file format: Prom RW with native histogram needs to be in protobuf format and I don't find an easy and readable way to provide that kind of input as a file for the test input to consume. (I went down the prototext rabbit hole only to find there're some version mismatch of the proto unmarshall librarys - remember prometheus is using quite a unique proto library) So I just opted for json files for input and parse json into a prometheus remote write request struct. (This is also why I end up not using "github.com/influxdata/telegraf/testutil/plugin_input"

You can usually also encode the data as JSON (see e.g. the GNMI unit-tests). If that's not an option, the data can be stored as binary data in the testcase directories...

  • As for output, it could be json or influx lines or whatever. It doesn't matter. Used json just for consistency.

The GNMI test linked above might be a nice example of how to use the testutil machinery to ease testing. The big benefit of folder-based test-cases is that you can easily add tests with data provided by the user. In e.g. GNMI we do have an option to output the raw (protobuf) input data making reproducing issues a copy-and-paste operation.

srebhan avatar Feb 24 '25 14:02 srebhan

Thank you so much for the feedback; I would take a look soon and iterate on this again.

yulong-db avatar Feb 25 '25 04:02 yulong-db

By using

  1. jsonpb to parse json input files
  2. influx.Parser to parse influxdb line format expected output files

We are able to greatly simplify the test setup logic. Thanks for your wonderful suggestions!

Reimirno avatar Mar 04 '25 06:03 Reimirno

I think the broken integration test (on output.sql) is unrelated to my changes and already fixed in master by https://github.com/influxdata/telegraf/commit/edb28dea0715ab20c6f32ccd954db8601ba82c7a should I rebase or do we don't care?

Reimirno avatar Mar 05 '25 06:03 Reimirno

Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip. Downloads for additional architectures and packages are available below.

:partying_face: This pull request decreases the Telegraf binary size by -2.69 % for linux amd64 (new size: 284.0 MB, nightly size 291.8 MB)

:package: Click here to get additional PR build artifacts

Artifact URLs

DEB RPM TAR GZ ZIP
amd64.deb aarch64.rpm darwin_amd64.tar.gz windows_amd64.zip
arm64.deb armel.rpm darwin_arm64.tar.gz windows_arm64.zip
armel.deb armv6hl.rpm freebsd_amd64.tar.gz windows_i386.zip
armhf.deb i386.rpm freebsd_armv7.tar.gz
i386.deb ppc64le.rpm freebsd_i386.tar.gz
mips.deb riscv64.rpm linux_amd64.tar.gz
mipsel.deb s390x.rpm linux_arm64.tar.gz
ppc64el.deb x86_64.rpm linux_armel.tar.gz
riscv64.deb linux_armhf.tar.gz
s390x.deb linux_i386.tar.gz
linux_mips.tar.gz
linux_mipsel.tar.gz
linux_ppc64le.tar.gz
linux_riscv64.tar.gz
linux_s390x.tar.gz

telegraf-tiger[bot] avatar Mar 05 '25 06:03 telegraf-tiger[bot]

@Reimirno just ignore this for now, we know it's fixed in master...

srebhan avatar Mar 06 '25 11:03 srebhan