telegraf icon indicating copy to clipboard operation
telegraf copied to clipboard

Make "Merge" plugin work with aggregator results

Open lambdaq opened this issue 3 years ago • 5 comments

Feature Request

Proposal:

The Merge plugin only works with raw input data, every 1 minute, it doesn't play nicely with aggregator outputs.

Current behavior:

[agent]
  collection_jitter = "0s"

[[inputs.cpu]]
  percpu = false
  totalcpu = true
  collect_cpu_time = false
  report_active = false
  fieldpass = ["usage_idle"]

[[aggregators.quantile]]
  period = "5s"
  drop_original = true
  quantiles = [0.95]
  order = 1

[[aggregators.basicstats]]
  drop_original = true
  period = "5s"
  stats = ["count", "rate", "sum"]
  order = 2

[[aggregators.merge]]
  drop_original = true
  order = 3
  fieldpass = ["usage_idle_*"]

Output:

cpu,cpu=cpu-total usage_idle_095=82.67260012136957 1637893260000000000
cpu,cpu=cpu-total usage_idle_count=5,usage_idle_sum=402.1946738461496,usage_idle_rate=0.28685386306114324 1637893260000000000
cpu,cpu=cpu-total usage_idle_count=5,usage_idle_sum=390.2688202755629,usage_idle_rate=-2.64097419767619 1637893265000000000
cpu,cpu=cpu-total usage_idle_095=83.12142252112594 1637893265000000000

You see the lines are with the same measurement, tags and timestamp but didn't merge.

From the debug log it looks like the merge works at different time window than quantile or basicstats.

Desired behavior:

Merge aggregate results.

lambdaq avatar Nov 26 '21 02:11 lambdaq

Hi,

To ensure I understand your request, you were expecting the output to only show these two lines?

cpu,cpu=cpu-total usage_idle_095=82.67260012136957 1637893260000000000
cpu,cpu=cpu-total usage_idle_095=83.12142252112594 1637893265000000000

Thanks!

powersj avatar Dec 01 '21 15:12 powersj

Hi,

Can you clarify what you were looking for, otherwise I am going to close this issue.

Thanks!

powersj avatar Dec 09 '21 19:12 powersj

@powersj Hi, I need to combine the usage_idle_095, usage_idle_count, usage_idle_sum, usage_idle_rate to a single line.

lambdaq avatar Dec 20 '21 05:12 lambdaq

I have not played with aggregators like this before, but I do not believe aggregators can be combined as you are assuming. I added some debug output to the three aggregators you are trying to use and found:

  • The basicstats and quantile aggregators are receiving the same data
  • The merge aggregator is not receiving any metrics in the first place

Per the aggregator docs, aggregators run against metrics collected in the time period and are not passed between aggregators. If you run with the --debug option you will see additional output from the aggregators about the ranges that they are looking for metrics:

2021-12-20T14:54:51Z D! [aggregators.merge] Updated aggregation range [2021-12-20 07:54:30 -0700 MST, 2021-12-20 07:55:00 -0700 MST]
2021-12-20T14:54:51Z D! [aggregators.quantile] Updated aggregation range [2021-12-20 07:54:50 -0700 MST, 2021-12-20 07:54:55 -0700 MST]
2021-12-20T14:54:51Z D! [aggregators.basicstats] Updated aggregation range [2021-12-20 07:54:50 -0700 MST, 2021-12-20 07:54:55 -0700 MST]
2021-12-20T14:54:55Z D! [aggregators.basicstats] Updated aggregation range [2021-12-20 07:54:55 -0700 MST, 2021-12-20 07:55:00 -0700 MST]
2021-12-20T14:54:55Z D! [aggregators.quantile] Updated aggregation range [2021-12-20 07:54:55 -0700 MST, 2021-12-20 07:55:00 -0700 MST]
2021-12-20T14:55:00Z D! [aggregators.quantile] Updated aggregation range [2021-12-20 07:55:00 -0700 MST, 2021-12-20 07:55:05 -0700 MST]
2021-12-20T14:55:00Z D! [aggregators.basicstats] Updated aggregation range [2021-12-20 07:55:00 -0700 MST, 2021-12-20 07:55:05 -0700 MST]
2021-12-20T14:55:00Z D! [aggregators.merge] Updated aggregation range [2021-12-20 07:55:00 -0700 MST, 2021-12-20 07:55:30 -0700 MST]

In the below output, you will see the hash calculated by the basicstats and quantile aggregators are the same, the same data is getting passed into both, not the previous aggregators data. Additionally, the merge aggregator receives nothing:

2021-12-20T14:46:29Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"ryzen", Flush Interval:10s
merge received 0 metrics
cpu map[cpu:cpu-total host:ryzen] map[usage_idle:98.3016920526716] 1640011600000000000
basicstats metric hash: 15405109440427227228
cpu map[cpu:cpu-total host:ryzen] map[usage_idle:98.3016920526716] 1640011600000000000
quantile metric hash: 15405109440427227228
cpu,cpu=cpu-total,host=ryzen usage_idle_095=98.3016920526716 1640011605000000000
cpu,cpu=cpu-total,host=ryzen usage_idle_count=1,usage_idle_sum=98.3016920526716 1640011605000000000

powersj avatar Dec 20 '21 14:12 powersj

oh man! I am running into the same issue. It would be really nice to be able to merge the metrics generated by basicstats. I am trying to calculate a mean of multiple metrics and was hoping to combine all the averages into one metric. It could be:

mean,host=hostname,tag3=foo3,tag4=bar4 usage_system_mean=0.8224519209236311,used_percent_mean=19.366610129780497 1669834650000000000

usage_system_mean is the mean calculated by basicstats from cpu input metric. usage_percent_mean is the mean calculated by basicstats from mem input metric.

Instead it is showing up like this:

mean,host=hostname,tag3=foo3,tag4=bar4 used_percent_mean=19.323841042330674 1669835580000000000
mean,host=hostname,tag3=foo3,tag4=bar4 usage_system_mean=0.8446296752873947 1669835580000000000

nsandhu-godaddy avatar Nov 30 '22 19:11 nsandhu-godaddy

Running into the same issue here. Debugging my config for hours, thinking I did something wrong. Now I see it just doesn't work with basicstats.... This feature is really wanted!

renevdm avatar Feb 11 '23 15:02 renevdm

Also have a need for this. My use-case is for the Redis plugin - it currently spits out a hash with name "redis" for global information and a hash per database for "redis_keyspace" (e.g. number of keys in the DB). What I want to do is sum the number of keys across all databases (via the basicstats aggregator) and then merge that into the global information hash so that there's a single hash for a particular timestamp (via the merge aggregator).

I'm assuming the aggregators run in parallel right now, whereas for this to work they would have to run in serial (i.e. like processors via "order").

EDIT: managed to do it in a different way if anyone else has this issue:

[[processors.rename]]
  [[processors.rename.replace]]
    measurement = "redis_keyspace"
    dest = "redis"
[[aggregators.basicstats]]
  period = "30s"
  drop_original = true
  stats = ["sum"]

DimitriosLisenko avatar Mar 21 '23 14:03 DimitriosLisenko