telegraf icon indicating copy to clipboard operation
telegraf copied to clipboard

internal metrics mixed together if no alias defined on output plugins

Open shric opened this issue 6 months ago • 4 comments

Relevant telegraf.conf

[[outputs.http]]
url="http://one.example.com/write"

[[outputs.http]]
url="http://two.example.com/write"

Logs from Telegraf

Aug 23 05:05:35 foo.example.com telegraf[796261]: 2024-08-23T10:05:35Z D! [agent] Attempting connection to [outputs.http]
Aug 23 05:05:35 foo.example.com telegraf[796261]: 2024-08-23T10:05:35Z D! [agent] Successfully connected to outputs.http
Aug 23 05:05:35 foo.example.com telegraf[796261]: 2024-08-23T10:05:35Z D! [agent] Attempting connection to [outputs.prometheus_client]
Aug 23 05:05:35 foo.example.com telegraf[796261]: 2024-08-23T10:05:35Z I! [outputs.prometheus_client] Listening on https://0.0.0.0:9273/metrics
Aug 23 05:05:35 foo.example.com telegraf[796261]: 2024-08-23T10:05:35Z D! [agent] Successfully connected to outputs.prometheus_client
Aug 23 05:05:35 foo.example.com telegraf[796261]: 2024-08-23T10:05:35Z D! [agent] Starting service inputs
Aug 23 05:05:35 foo.example.com telegraf[796261]: 2024-08-23T10:05:35Z I! [inputs.socket_listener] Listening on tcp://0.0.0.0:8094
Aug 23 05:05:42 foo.example.com telegraf[796261]: 2024-08-23T10:05:42Z D! [outputs.http] Wrote batch of 8 metrics in 666.983697ms
Aug 23 05:05:42 foo.example.com telegraf[796261]: 2024-08-23T10:05:42Z D! [outputs.http] Buffer fullness: 0 / 100000 metrics
Aug 23 05:05:45 foo.example.com telegraf[796261]: 2024-08-23T10:05:45Z D! [outputs.http] Buffer fullness: 8 / 100000 metrics
Aug 23 05:05:45 foo.example.com telegraf[796261]: 2024-08-23T10:05:45Z E! [agent] Error writing to outputs.http: Post "https://two.example.com/write": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Aug 23 05:05:47 foo.example.com telegraf[796261]: 2024-08-23T10:05:47Z D! [outputs.http] Buffer fullness: 0 / 100000 metrics
Aug 23 05:05:51 foo.example.com telegraf[796261]: 2024-08-23T10:05:51Z D! [outputs.http] Buffer fullness: 8 / 100000 metrics
Aug 23 05:05:51 foo.example.com telegraf[796261]: 2024-08-23T10:05:51Z E! [agent] Error writing to outputs.http: Post "https://two.example.com/write": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

System info

telegraf-1.31.3-1.x86_64, Alma Linux 8.8

Docker

No response

Steps to reproduce

  1. Add more than one http output per telegraf.conf above.
  2. Enable telegraf internal metrics
  3. Make one of the outputs an unreachable endpoint so that the internal buffer fills on one and not the other (to illustrate the bug in the metrics)

Expected behavior

Separate internal_write_buffer_size metrics for each outputs.http instance. An additional url tag, for example, could disambiguate.

Actual behavior

metrics such as internal_write_buffer_size{output="http"} will randomly report either 0 (for the reachable output) or a nonzero value (for the unreachable output) against the same metric.

Additional info

If you provide an e.g. alias="one" and alias="two" in the above config, then you do get two unique metrics. However, it would seem to be a bug to allow telegraf to produce a single useless metric that flaps ambiguously between multiple values if aliases aren't defined.

shric avatar Aug 23 '24 11:08 shric