flagger icon indicating copy to clipboard operation
flagger copied to clipboard

Using external metrics from the Kubernetes API server

Open jonnylangefeld opened this issue 3 years ago • 2 comments

Describe the feature

We are currently using flagger with Datadog metrics. Even though we don't have a very high amount of Canary resources with too frequent Datadog requests (we calculated 1860 requests per hour), flagger hits the Datadog rate limit:

Warning  Synced  18m (x11 over 8h)      flagger  Metric query failed for sli: error response: {"errors": ["Rate limit of 6000 requests in 3600 seconds reached. Please try again later."]}: %!w(<nil>)

One would assume that Datadog's officially supported Kubernetes external metrics feature would have the same issue, but the secret sauce there is batched metrics requests as described here.

Proposed solution

Kubernetes exposed external metrics via API endpoints, such as this:

╰─ kubectl get --raw  "/apis/external.metrics.k8s.io/v1beta1/namespaces/ingress/gcp.pubsub.subscription.num_undelivered_messages" | jq
{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {},
  "items": [
    {
      "metricName": "gcp.pubsub.subscription.num_undelivered_messages",
      "metricLabels": {
        "project_id": "123",
        "subscription_id": "abc"
      },
      "timestamp": "2022-07-14T20:24:26Z",
      "value": "0"
    }
  ]
}

Would it be possible to let flagger query these endpoints the same way a HorizontalPodAutoscaler would do it?

Is there another way to solve this problem that isn't as good a solution?

  • Implement our own proxy that does batch queries just like the Datadog Cluster Agent.

jonnylangefeld avatar Jul 14 '22 20:07 jonnylangefeld

Hello, did you you find a way around this ? I'm facing the same issue

MrChausson avatar Jan 16 '25 14:01 MrChausson