flagger icon indicating copy to clipboard operation
flagger copied to clipboard

AWS ALB Ingress with Flagger

Open nrutigs opened this issue 5 years ago • 17 comments

Is there any plans to develop or is it possible to contribute ALB Ingress support for flagger?

There appears to be support in ALB annotations for custom weights

  • https://kubernetes-sigs.github.io/aws-alb-ingress-controller/guide/ingress/annotation/#actions
    alb.ingress.kubernetes.io/actions.forward-multiple-tg: >
      {"Type":"forward","ForwardConfig":{"TargetGroups":[{"ServiceName":"service-1","ServicePort":"80","Weight":20},{"ServiceName":"service-2","ServicePort":"80","Weight":20},{"TargetGroupArn":"arn-of-your-non-k8s-target-group","Weight":60}],"TargetGroupStickinessConfig":{"Enabled":true,"DurationSeconds":200}}}

As well as support for matching rules

  • https://kubernetes-sigs.github.io/aws-alb-ingress-controller/guide/ingress/annotation/#traffic-routing
    alb.ingress.kubernetes.io/conditions.rule-path3: >
      [{"Field":"http-header","HttpHeaderConfig":{"HttpHeaderName": "HeaderName", "Values":["HeaderValue1", "HeaderValue2"]}}]

Thanks!

nrutigs avatar Jul 31 '20 01:07 nrutigs

Flagger knows about Kubernetes services, so it can set that part in the annotation, but what's TargetGroupArn?

stefanprodan avatar Jul 31 '20 13:07 stefanprodan

To my knowledge its something outside of the cluster that you'd like the ALB to split traffic to as well

nrutigs avatar Jul 31 '20 18:07 nrutigs

Hey @stefanprodan is there any update on this? Is there any other information I can provide to get an answer? It's definitely something I'd look to contribute if its possible :)

nrutigs avatar Aug 04 '20 16:08 nrutigs

@nrutigs one key feature in Flagger are the builtin metrics such as success rate and latency. These are implemented with Prometheus queries, does the ALB ingress exposes a metrics endpoint for Prometheus?

stefanprodan avatar Aug 04 '20 17:08 stefanprodan

So the ingress does expose a metrics endpoint - https://github.com/kubernetes-sigs/aws-alb-ingress-controller/blob/v1.1.4/cmd/main.go#L153 https://github.com/kubernetes-sigs/aws-alb-ingress-controller/blob/v1.1.4/cmd/main.go#L104-L107

However I think these are for internal monitoring of the ingress controller and not the ingresses themselves. For that you likely need to access Cloudwatch instead but it seems like Flagger already has that as a provider that it could use?

nrutigs avatar Aug 04 '20 17:08 nrutigs

The builtin metrics are for Prometheus. I guess we can say in the docs that for ALB people should use Cloudwatch. We should provider two metric templates for error rate and latency https://docs.flagger.app/usage/metrics#amazon-cloudwatch

stefanprodan avatar Aug 04 '20 18:08 stefanprodan

Wicked! Is there any other blockers behind using ALB you can see? Otherwise it's definitely something I'll try to start working on contributing.

nrutigs avatar Aug 04 '20 18:08 nrutigs

My main concern is around maintenance because you can't have an e2e test suite for ALB+CloudWatch on Kubernetes Kind, like we have for any other ingress controller. @nrutigs if you are willing to work on this I'll be happy to review a PR.

stefanprodan avatar Aug 05 '20 05:08 stefanprodan

Hmm that definitely is an issue. Maybe there's a solution using eksctl and some bash scripting but obviously it might be harder to fit that into your CI.

Thanks for the responses for now! I'll probably annoy you in #flagger slack in the near future if I can get through the right approvals to do this at work.

nrutigs avatar Aug 05 '20 05:08 nrutigs

@nrutigs running eksctl in CI could do it but but it's a lot of work, clusters must be created on the fly and removed after a test run. The e2e test framework must ensure ALB+CloudWatch are ready, this could mean waiting 30m or more for the cluster to be created and for ALB instance to become ready.

My impression is that ALB metrics are not sent in real-time in CloudWatch, Flagger needs the metrics data to be "fresh", for example Prometheus has a 5sec delay. If CloudWatch is several minutes behind then the analysis will fail since Flagger will not be able to determine if the canary is conformant or not. One workaround would be to increase the analysis interval to 10 minutes or more but I'm not sure if this could work, you'll need to try it out.

Another important aspect of testing on AWS is around who will support the cost of spinning up EKS clusters on each commit.

stefanprodan avatar Aug 05 '20 05:08 stefanprodan

Could a possible workaround be ALB -> Nginx ingress (with prometheus exports) -> Flagger monitored Canary deployment? Flagger would then monitor the prometheus metrixs from Nginx Ingress rather than the ALB. Then it's just up to the user to define ALB -> Nginx Ingress routing

akuzni2 avatar Apr 29 '22 14:04 akuzni2

@akuzni2 Flagger already supports NGINX ingress, if an ALB sits in front of it, then it’s irrelevant to the canary analysis and routing. Docs here: https://docs.flagger.app/tutorials/nginx-progressive-delivery

stefanprodan avatar Apr 30 '22 08:04 stefanprodan

A use-case that we would like this for is to actually have Canary or Blue/Green deployments of the ingress-nginx itself, so we can auto-update and also to do the roll out in a more controlled manner with the metrics.

rafaelgaspar avatar Aug 01 '23 13:08 rafaelgaspar

Here's ALB ingress controller's docs for setting up weighted traffic splitting, which could be used by flagger to support it officially:

https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.6/guide/use_cases/blue_green/

mhr3 avatar Sep 03 '23 13:09 mhr3

Any updates on this? There must be a bunch of people who would love this support (myself included!)

cemenson avatar Sep 27 '23 03:09 cemenson

FYI, Argo CD's Argo Rollouts (similar to flagger), has support for AWS ALB, if anyone is comparing between the two: https://argoproj.github.io/rollouts/

Chili-Man avatar Jan 08 '24 15:01 Chili-Man