telegraf Add ability to override precision for merge processor

We have been trying to get the merge processor to work for a while. When you use the JSON output, everything is sampled down to the second (by default). This makes it difficult to understand why metrics aren't merged.

The code is forced to merge metrics at the nanosecond level. Although, for streaming telemetry (JTI/GNMI), the agent receives documents that need merged. Being able to override would have the benefits of keeping the current timestamp if needed but being able to merge without the precision window needed.

https://github.com/influxdata/telegraf/blob/d72e517544d9067edd25701693e317b479ecea06/plugins/aggregators/merge/merge.go#L33-L41

I guess a workaround would be the change the event precision time with Starlark.

Sep 14 '22 16:09 smalenfant

Hi, sorry to hear you are having problems.

I have some clarifying questions. Are you trying to merge metrics that have different time stamps? Can you please post your config and some metrics you would like to merge so we can better understand the issue?

Thanks

Sep 19 '22 18:09 MyaLongmire

@MyaLongmire I'm trying to merge metrics that are really close together (within a second). I was able to use a workaround that would remove some of the precision and this actually worked. Although, this would obviously not work when I would have the metrics to the nearest second. Seems like I could also fix that with Starlark.

The problem with this solution is that I also lose precision with the timestamp to do differential calculations downstream.

[[processors.starlark]]
  order = 2
  source = '''
load('time.star', 'time')
def apply(metric):
  metric.time = time.from_timestamp(int(metric.time/1000/1000/1000)).unix_nano
  return metric
'''

[[aggregators.merge]]
  order = 100
  # precision = "1s"
  drop_original = false
  grace = "5s"

Here's an example where there is field pertinent to a single interface that comes at different times across the wire that I would like to merge together:

{"fields":{"in-errors":0,"in-octets":1138439976,"out-octets":1062616280,"tags":{"device":"router1","host":"e61014f3fe28","interface-name":"TenGigE0/0/0/0/0"},"timestamp":1663265138823}
{"fields":{"admin-status":"UP,"mtu":9100,"oper-status":"UP"},"name":"interfaces","tags":{"device":"router1","host":"e61014f3fe28","interface-name":"TenGigE0/0/0/0/0"},"timestamp":1663265138989}

There is A LOT of metrics coming about the same time, they are spaced out sometimes by a few hundred milliseconds:

September 15, 2022 6:05:38.823 PM
September 15, 2022 6:05:38.989 PM

Sep 19 '22 19:09 smalenfant

@MyaLongmire it seems like the merge aggregator doesn't take into account the period and/or grace aggregator settings.

Sep 22 '22 11:09 Hipska

If someone wants to put up a PR for this, we would want to see a new 'margin' config option that takes a config time duration. The user would also need to move some of the comparison code into the aggregator itself and copied out of the series grouper code. The hash function currently takes the timestamp into consideration, and this would need to be removed and then compared if the hashes are equal, based on the margin.

Feb 07 '24 19:02 powersj

@smalenfant please test PR #15319, available as soon as CI finished the tests, and let me know if this fixes your issue! Please use the new round_timestamp_to option for manipulating the timestamps of the metrics.

May 07 '24 16:05 srebhan

I totally missed this, will test. Thank you!

Jul 12 '24 18:07 smalenfant

telegraf telegraf copied to clipboard

Add ability to override precision for merge processor

telegraf
telegraf copied to clipboard