telegraf
telegraf copied to clipboard
EC2 Processor runs every 10 seconds
When running tests on a single node, the 10s telegraf run and EC2 API query don't seem to have an issue, but once there are other factors such as ones listed below, the continuous query of telegraf of the AWS API for EC2 tags becomes over-bearing: 1: chef/puppet running same interval 2: high volume of AWS resources 3: high volume of ANYTHING else
Is there a cache method or some other "check once and done" type of method (in lieu of hard-coding) for this type of processor?
There isn't a cache option available for the EC2 processor at the moment, it looks like this will have to be a new feature. Looking at the other available processors, the ifname
processor (https://github.com/influxdata/telegraf/tree/master/plugins/processors/ifname) has some caching logic and it provides a config to specify how long the cache will last, I'd imagine this processor would want to use similar logic.
Pull requests are always welcome if you think you might be able to extend this processor to adding caching?
WARNING -- I am a beginner at Go at best, so I am probably missing something that is painfully obvious to any one novice level or better...
What ifname
is doing seems to be a bit more heavyweight than what's needed here.
All I think really needs to be done is add a new config item for "cache time," then ~ line 219 or so https://github.com/influxdata/telegraf/blob/7e652fdd005dcdf85ac260f497b998915f4361db/plugins/processors/aws/ec2/ec2.go#L219, we check:
- if
r.cacheTime
is >0, - if
r.cachedDto
is non-empty, - if now minus
r.lastDtoRequest
is fewer than r.cacheTime seconds (unix time), then
use the value of r.cachedDto
instead of dto
.
If either of the second two checks are false we make the call to ec2:DescribeTags
as usual, but if no errors occur, set r.cachedDto
to the value of dto
and set r.lastDtoRequest
to now.
Add cache_time
to our toml files, and those three values to the struct on line 23... and done.
@jrimmer-housecallpro your logic seems correct to me, would you like to create a pull request with these changes so it can be tested and reviewed?
I would like to -- finding the time is the trick.
Just want to mention that w/o cache feature this plugin is useless in production environments.
this is in master https://github.com/influxdata/telegraf/commit/2fed77e02ad963c7aeb150cfa282bddfd715b927 it just doesn't link to this thread
Thanks
Fixed with https://github.com/influxdata/telegraf/pull/13075