telegraf icon indicating copy to clipboard operation
telegraf copied to clipboard

EC2 Processor runs every 10 seconds

Open 6sossomons opened this issue 3 years ago • 5 comments

When running tests on a single node, the 10s telegraf run and EC2 API query don't seem to have an issue, but once there are other factors such as ones listed below, the continuous query of telegraf of the AWS API for EC2 tags becomes over-bearing: 1: chef/puppet running same interval 2: high volume of AWS resources 3: high volume of ANYTHING else

Is there a cache method or some other "check once and done" type of method (in lieu of hard-coding) for this type of processor?

6sossomons avatar Jan 25 '22 15:01 6sossomons

There isn't a cache option available for the EC2 processor at the moment, it looks like this will have to be a new feature. Looking at the other available processors, the ifname processor (https://github.com/influxdata/telegraf/tree/master/plugins/processors/ifname) has some caching logic and it provides a config to specify how long the cache will last, I'd imagine this processor would want to use similar logic.

Pull requests are always welcome if you think you might be able to extend this processor to adding caching?

sspaink avatar Jan 25 '22 17:01 sspaink

WARNING -- I am a beginner at Go at best, so I am probably missing something that is painfully obvious to any one novice level or better...

What ifname is doing seems to be a bit more heavyweight than what's needed here.

All I think really needs to be done is add a new config item for "cache time," then ~ line 219 or so https://github.com/influxdata/telegraf/blob/7e652fdd005dcdf85ac260f497b998915f4361db/plugins/processors/aws/ec2/ec2.go#L219, we check:

  • if r.cacheTime is >0,
  • if r.cachedDto is non-empty,
  • if now minus r.lastDtoRequest is fewer than r.cacheTime seconds (unix time), then

use the value of r.cachedDto instead of dto.

If either of the second two checks are false we make the call to ec2:DescribeTags as usual, but if no errors occur, set r.cachedDto to the value of dto and set r.lastDtoRequest to now.

Add cache_time to our toml files, and those three values to the struct on line 23... and done.

jrimmer-housecallpro avatar Mar 17 '22 21:03 jrimmer-housecallpro

@jrimmer-housecallpro your logic seems correct to me, would you like to create a pull request with these changes so it can be tested and reviewed?

sspaink avatar Apr 06 '22 21:04 sspaink

I would like to -- finding the time is the trick.

jrimmer-housecallpro avatar Apr 07 '22 22:04 jrimmer-housecallpro

Just want to mention that w/o cache feature this plugin is useless in production environments.

aslobodskoy avatar Jan 04 '23 14:01 aslobodskoy

this is in master https://github.com/influxdata/telegraf/commit/2fed77e02ad963c7aeb150cfa282bddfd715b927 it just doesn't link to this thread

certara-mchamberland avatar Jun 29 '23 21:06 certara-mchamberland

Thanks

Fixed with https://github.com/influxdata/telegraf/pull/13075

powersj avatar Jun 29 '23 22:06 powersj