telegraf
telegraf copied to clipboard
telegraf forwarding empty metrics
Relevant telegraf.conf:
[global_tags]
dc = "1"
[agent]
hostname = ""
omit_hostname = false
interval = "60s"
round_interval = true
metric_batch_size = 10000
metric_buffer_limit = 100000
collection_jitter = "10s"
flush_interval = "10s"
flush_jitter = "5s"
precision = ""
logfile = "/var/log/telegraf/telegraf.log"
debug = false
quiet = false
[[inputs.prometheus]]
## An array of urls to scrape metrics from.
urls = ["http://localhost:3000/metrics"]
metric_version = 2
name_override = "grafana"
tagexclude = ["url"]
[inputs.prometheus.tags]
metrictype = "applicationlevel"
[[outputs.kafka]]
brokers = ["<broker list>"]
compression_codec = 1
data_format = "json"
required_acks = 1
routing_tag = "host"
tagexclude = ["metrictype"]
topic = "metrics"
[outputs.kafka.tagpass]
metrictype = ["applicationlevel"]
System info:
telegraf 1.17.2 and 1.19.3 Ubuntu 20.04
Steps to reproduce:
Collecting prometheus-style data from a Grafana instance, I consistently get some empty values collected and forwarded to Kafka. While they don't hurt anything, it seems like they should be dropped.
For example, this metric was dropped on the other side of Kafka:
{"fields":{},"tags":{"dc":"1","handler":"/","host":"grafana","method":"get","quantile":"0.99","statuscode":"200"},"name":"grafana_","timestamp":1631129284}
You can see there are no actual fields attached to the metric.
Expected behavior:
Empty metrics aren't forwarded to systems and are instead dropped or filtered in telegraf. Or even not collected in the first place.
Additional info:
This also occurs in data from Benthos (https://benthos.dev/) and occasionally with statsd data.
In the case of Grafana, it seems to be histogram stats that have a NaN
value. I would assume something similar is happening in other places, where a system is initially submitting a NaN
and telegraf is (reasonably) not keeping that field, but not actually dropping the entire metric.
next steps: review the now closed PR #9875 and see if it resolves the issue
Hey @johnseekins! Dropping metrics with empty fields in general seems a bit extreme as it might make sense for some outputs to only go with tags, so I think this is an output-specific thing.
This being said, would you be ok to have a drop_metrics_without_fields
(or something with a better name) option?
It really looks like https://github.com/influxdata/telegraf/pull/9875 would have solved my issue. I think in the case of Grafana, the problem was with NaN
s in histograms. Too bad it was closed.
The problem is, for me, I no longer work at the company we were seeing the problem at, so I can't really test any solutions.
@johnseekins do you mind describing the NaN issue (maybe in another issue)? I can certainly take a look if you have some ability to test or provide test data...
Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you!