telegraf icon indicating copy to clipboard operation
telegraf copied to clipboard

telegraf forwarding empty metrics

Open johnseekins opened this issue 3 years ago • 4 comments

Relevant telegraf.conf:

[global_tags]
  dc = "1"

[agent]
  hostname = ""
  omit_hostname = false
  interval = "60s"
  round_interval = true
  metric_batch_size = 10000
  metric_buffer_limit = 100000
  collection_jitter = "10s"
  flush_interval = "10s"
  flush_jitter = "5s"
  precision = ""
  logfile = "/var/log/telegraf/telegraf.log"
  debug = false
  quiet = false

[[inputs.prometheus]]
  ## An array of urls to scrape metrics from.
  urls = ["http://localhost:3000/metrics"]
  metric_version = 2
  name_override = "grafana"
  tagexclude = ["url"]
  [inputs.prometheus.tags]
    metrictype = "applicationlevel"

[[outputs.kafka]]
  brokers = ["<broker list>"]
  compression_codec = 1
  data_format = "json"
  required_acks = 1
  routing_tag = "host"
  tagexclude = ["metrictype"]
  topic = "metrics"
  [outputs.kafka.tagpass]
    metrictype = ["applicationlevel"]

System info:

telegraf 1.17.2 and 1.19.3 Ubuntu 20.04

Steps to reproduce:

Collecting prometheus-style data from a Grafana instance, I consistently get some empty values collected and forwarded to Kafka. While they don't hurt anything, it seems like they should be dropped.

For example, this metric was dropped on the other side of Kafka:

{"fields":{},"tags":{"dc":"1","handler":"/","host":"grafana","method":"get","quantile":"0.99","statuscode":"200"},"name":"grafana_","timestamp":1631129284}

You can see there are no actual fields attached to the metric.

Expected behavior:

Empty metrics aren't forwarded to systems and are instead dropped or filtered in telegraf. Or even not collected in the first place.

Additional info:

This also occurs in data from Benthos (https://benthos.dev/) and occasionally with statsd data.

In the case of Grafana, it seems to be histogram stats that have a NaN value. I would assume something similar is happening in other places, where a system is initially submitting a NaN and telegraf is (reasonably) not keeping that field, but not actually dropping the entire metric.

johnseekins avatar Sep 08 '21 19:09 johnseekins

next steps: review the now closed PR #9875 and see if it resolves the issue

powersj avatar Apr 25 '22 20:04 powersj

Hey @johnseekins! Dropping metrics with empty fields in general seems a bit extreme as it might make sense for some outputs to only go with tags, so I think this is an output-specific thing. This being said, would you be ok to have a drop_metrics_without_fields (or something with a better name) option?

srebhan avatar Sep 08 '22 17:09 srebhan

It really looks like https://github.com/influxdata/telegraf/pull/9875 would have solved my issue. I think in the case of Grafana, the problem was with NaNs in histograms. Too bad it was closed.

The problem is, for me, I no longer work at the company we were seeing the problem at, so I can't really test any solutions.

johnseekins avatar Sep 08 '22 17:09 johnseekins

@johnseekins do you mind describing the NaN issue (maybe in another issue)? I can certainly take a look if you have some ability to test or provide test data...

srebhan avatar Sep 08 '22 17:09 srebhan

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you!

telegraf-tiger[bot] avatar Mar 23 '23 18:03 telegraf-tiger[bot]