yet-another-cloudwatch-exporter
yet-another-cloudwatch-exporter copied to clipboard
[FEATURE] Support to expose historical data points
Is there an existing issue for this?
- [X] I have searched the existing issues
Feature description
Context
In the context of jobs, Yace allows scraping CloudWatch metrics, and the frequency and amount of data points are controlled by:
-
period: <int>
: statistic period in seconds -
length: <int>
: how far back to request data in seconds
All the examples in the examples
folder, show how to scrape different types of metrics and the period and length happen to be always the same. I'm currently working with a client who adopted YACE to scrape metrics and we configured it to get data points with a granularity of 1 minute and 5 minutes worth of data.
Take the following yace-config.yaml
as a concrete example:
apiVersion: v1alpha1
sts-region: eu-west-2
discovery:
jobs:
- type: alb
regions:
- eu-west-2
period: 60
length: 300
addCloudwatchTimestamp: true
dimensionNameRequirements:
- LoadBalancer
metrics:
- name: TargetResponseTime
statistics: [Average]
Expected Behaviour
We run YACE, via docker-compose
, using the configuration above and we were expecting to see the following:
- As YACE is by default configured to scrape every 5 minutes the metrics in Prometheus were delayed by 5 minutes AS EXPECTED
- Prometheus was configured to scrape YACE every minute (again default configuration), so we were expecting to see 5 datapoints in Prometheus shortly after YACE made the datapoints available in the
/metrics
endpoint. However, only the last data point was presented NOT EXPECTED
I spent a fair amount of time trying to understand if we misconfigured YACE but after enabling the debug logs, spending a fair amount of time looking at the source code, and running YACE locally I realised that's how is currently implemented. See https://github.com/nerdswords/yet-another-cloudwatch-exporter/blob/c7807a770bb427f8ddb2c7becac51185fb3e8230/pkg/clients/cloudwatch/v1/client.go#L120-L128
I believe this is NOT a BUG but a design decision to only include the latest data points and not any historic data as Prometheus will reject any samples that are too old.
Having said that, I believe that it might be really useful to be able to include historic
data points if the period
is small and the length
does not go too far (making sure Prometheus is still happily ingesting all the samples)
New Feature
Implementing the approach above could be a bit controversial, and it will introduce breaking changes (or extra behaviour when the length is bigger than the period setting). Therefore, I'm proposing to introduce a new feature, e.g. -enable-feature=allow-multiple-datapoints
or -enable-feature=allow-historical-datapoints
whereby all the data points are included. So, in the example above, rather than getting the latest data point, the metrics
page will include the 5 fetched from the CloudWatch API
What might the configuration look like?
No extra configuration would be required as I'm suggesting controlling this new behaviour globally via a new feature flag but I'm open to suggestions as it could be useful to have a fine-grain control by including an extra config setting at the job level
Anything else?
I'll include more details in terms of images and logs later on but I'm happy to give it a go, although it's been a while since I've done any serious development and I don't have huge experience with golang 😅