telegraf
telegraf copied to clipboard
Make "Merge" plugin work with aggregator results
Feature Request
Proposal:
The Merge plugin only works with raw input data, every 1 minute, it doesn't play nicely with aggregator outputs.
Current behavior:
[agent]
collection_jitter = "0s"
[[inputs.cpu]]
percpu = false
totalcpu = true
collect_cpu_time = false
report_active = false
fieldpass = ["usage_idle"]
[[aggregators.quantile]]
period = "5s"
drop_original = true
quantiles = [0.95]
order = 1
[[aggregators.basicstats]]
drop_original = true
period = "5s"
stats = ["count", "rate", "sum"]
order = 2
[[aggregators.merge]]
drop_original = true
order = 3
fieldpass = ["usage_idle_*"]
Output:
cpu,cpu=cpu-total usage_idle_095=82.67260012136957 1637893260000000000
cpu,cpu=cpu-total usage_idle_count=5,usage_idle_sum=402.1946738461496,usage_idle_rate=0.28685386306114324 1637893260000000000
cpu,cpu=cpu-total usage_idle_count=5,usage_idle_sum=390.2688202755629,usage_idle_rate=-2.64097419767619 1637893265000000000
cpu,cpu=cpu-total usage_idle_095=83.12142252112594 1637893265000000000
You see the lines are with the same measurement, tags and timestamp but didn't merge.
From the debug log it looks like the merge works at different time window than quantile
or basicstats
.
Desired behavior:
Merge aggregate results.
Hi,
To ensure I understand your request, you were expecting the output to only show these two lines?
cpu,cpu=cpu-total usage_idle_095=82.67260012136957 1637893260000000000
cpu,cpu=cpu-total usage_idle_095=83.12142252112594 1637893265000000000
Thanks!
Hi,
Can you clarify what you were looking for, otherwise I am going to close this issue.
Thanks!
@powersj Hi, I need to combine the usage_idle_095
, usage_idle_count
, usage_idle_sum
, usage_idle_rate
to a single line.
I have not played with aggregators like this before, but I do not believe aggregators can be combined as you are assuming. I added some debug output to the three aggregators you are trying to use and found:
- The basicstats and quantile aggregators are receiving the same data
- The merge aggregator is not receiving any metrics in the first place
Per the aggregator docs, aggregators run against metrics collected in the time period and are not passed between aggregators. If you run with the --debug
option you will see additional output from the aggregators about the ranges that they are looking for metrics:
2021-12-20T14:54:51Z D! [aggregators.merge] Updated aggregation range [2021-12-20 07:54:30 -0700 MST, 2021-12-20 07:55:00 -0700 MST]
2021-12-20T14:54:51Z D! [aggregators.quantile] Updated aggregation range [2021-12-20 07:54:50 -0700 MST, 2021-12-20 07:54:55 -0700 MST]
2021-12-20T14:54:51Z D! [aggregators.basicstats] Updated aggregation range [2021-12-20 07:54:50 -0700 MST, 2021-12-20 07:54:55 -0700 MST]
2021-12-20T14:54:55Z D! [aggregators.basicstats] Updated aggregation range [2021-12-20 07:54:55 -0700 MST, 2021-12-20 07:55:00 -0700 MST]
2021-12-20T14:54:55Z D! [aggregators.quantile] Updated aggregation range [2021-12-20 07:54:55 -0700 MST, 2021-12-20 07:55:00 -0700 MST]
2021-12-20T14:55:00Z D! [aggregators.quantile] Updated aggregation range [2021-12-20 07:55:00 -0700 MST, 2021-12-20 07:55:05 -0700 MST]
2021-12-20T14:55:00Z D! [aggregators.basicstats] Updated aggregation range [2021-12-20 07:55:00 -0700 MST, 2021-12-20 07:55:05 -0700 MST]
2021-12-20T14:55:00Z D! [aggregators.merge] Updated aggregation range [2021-12-20 07:55:00 -0700 MST, 2021-12-20 07:55:30 -0700 MST]
In the below output, you will see the hash calculated by the basicstats and quantile aggregators are the same, the same data is getting passed into both, not the previous aggregators data. Additionally, the merge aggregator receives nothing:
2021-12-20T14:46:29Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"ryzen", Flush Interval:10s
merge received 0 metrics
cpu map[cpu:cpu-total host:ryzen] map[usage_idle:98.3016920526716] 1640011600000000000
basicstats metric hash: 15405109440427227228
cpu map[cpu:cpu-total host:ryzen] map[usage_idle:98.3016920526716] 1640011600000000000
quantile metric hash: 15405109440427227228
cpu,cpu=cpu-total,host=ryzen usage_idle_095=98.3016920526716 1640011605000000000
cpu,cpu=cpu-total,host=ryzen usage_idle_count=1,usage_idle_sum=98.3016920526716 1640011605000000000
oh man! I am running into the same issue. It would be really nice to be able to merge the metrics generated by basicstats. I am trying to calculate a mean of multiple metrics and was hoping to combine all the averages into one metric. It could be:
mean,host=hostname,tag3=foo3,tag4=bar4 usage_system_mean=0.8224519209236311,used_percent_mean=19.366610129780497 1669834650000000000
usage_system_mean
is the mean calculated by basicstats from cpu input metric.
usage_percent_mean
is the mean calculated by basicstats from mem input metric.
Instead it is showing up like this:
mean,host=hostname,tag3=foo3,tag4=bar4 used_percent_mean=19.323841042330674 1669835580000000000
mean,host=hostname,tag3=foo3,tag4=bar4 usage_system_mean=0.8446296752873947 1669835580000000000
Running into the same issue here. Debugging my config for hours, thinking I did something wrong. Now I see it just doesn't work with basicstats.... This feature is really wanted!
Also have a need for this. My use-case is for the Redis plugin - it currently spits out a hash with name "redis" for global information and a hash per database for "redis_keyspace" (e.g. number of keys in the DB). What I want to do is sum the number of keys across all databases (via the basicstats aggregator) and then merge that into the global information hash so that there's a single hash for a particular timestamp (via the merge aggregator).
I'm assuming the aggregators run in parallel right now, whereas for this to work they would have to run in serial (i.e. like processors via "order").
EDIT: managed to do it in a different way if anyone else has this issue:
[[processors.rename]]
[[processors.rename.replace]]
measurement = "redis_keyspace"
dest = "redis"
[[aggregators.basicstats]]
period = "30s"
drop_original = true
stats = ["sum"]