Flagger Datadog Provider Multiplies Interval by 10x
Problem
When I set a metric interval to 1m, Flagger checks metrics from 10 minutes ago instead of 1 minute ago.
When I set a metric interval to 3m, Flagger checks metrics from 30 minutes ago instead of 3 minutes ago.
Why This Is a Problem
- My canary analysis uses old data instead of recent data
- I can't catch problems quickly because Flagger is looking at old metrics
The Bug in the Code
I found the problem in Flagger's source code. In the file pkg/metrics/providers/datadog.go, there's this line:
datadogFromDeltaMultiplierOnMetricInterval = 10 // ← This is the bug!
This number 10 gets multiplied with my interval setting. So when I set 1m, it becomes 10m.
Here's the code that does it:
// This takes my interval (like "1m") and multiplies it by 10
dd.fromDelta = int64(datadogFromDeltaMultiplierOnMetricInterval * md.Seconds())
The fix is simple: Change that 10 to 1 (or remove the multiplication completely).
Temporary Fix
Right now, I have to set my interval to 1/10th of what I want:
- Want 1 minute? Set
6s - Want 3 minutes? Set
18s
But this is confusing and hard to remember.
My Setup
- Flagger version: 1.41.0
- Using Datadog provider
- Running on GKE with Istio
As far as I understand, this is due to a change introduced in https://github.com/fluxcd/flagger/pull/1763 which was solving https://github.com/fluxcd/flagger/issues/1762.
Why this was done this way in the first place is another question, as the behavior before https://github.com/fluxcd/flagger/pull/1763 was simply taking the newest value of the time series so the range of the queried timeseries was also irrelevant for the evaluated value in the end. With the recent change you at least have the possibility to define at which data point of the time series you wish to look at.