flagger icon indicating copy to clipboard operation
flagger copied to clipboard

Canary pods not getting terminated after canary promotion when used with Keda

Open cmodi-cogito opened this issue 1 year ago • 4 comments

Describe the bug

I have implemented canary deployment with Keda as per this doc - https://docs.flagger.app/tutorials/keda-scaledobject

It works fine except it's not removing canary pods after promotion.

As per doc, It should remove this annotation autoscaling.keda.sh/paused-replicas":"0" from ScaledObject during canary analysis so canary pods can scale up and it should add it back after promotion which would terminate canary pods but it does not add this annotation back to ScaledObject.

So not sure why and how this annotation gets deleted and why it's not added back after canary promotion.

NOTE: It works fine If I use native HPA for autoscaling instead of Keda scaling

To Reproduce

I have followed exactly same steps mentioned in doc - https://docs.flagger.app/tutorials/keda-scaledobject

Also I tried to add 2 annotations dummy: "yes" and autoscaling.keda.sh/paused-replicas":"0" manually on Keda ScaledObject resource but only autoscaling.keda.sh/paused-replicas":"0" annotation got deleted instantly triggering pod termination and as this annotation got deleted, canary pods came back again.

Expected behavior

It should scale down canary pods to 0 after canary promotion.

Additional context

  • Flagger version: 1.41.0
  • Keda version: 2.10.1
  • Kubernetes version: 1.30
  • Service Mesh provider: Istio
  • Ingress provider: Kong

cmodi-cogito avatar Apr 11 '25 15:04 cmodi-cogito

@aryan9600 @stefanprodan Any thoughts on this? Perhaps we are doing something wrong? Any other information needed to debug this? Thanks!

frankjkelly avatar May 01 '25 13:05 frankjkelly

FYI we are on Istio 1.18.0 - which is pretty old and use Flux (wondering if that might override some of the annotations that Flagger puts on the scaledobject CRD for the canary deployment?)

frankjkelly avatar May 01 '25 16:05 frankjkelly

Update: We think we have figured out the problem - in our lower environments, to save costs, we deploy kube-ns-suspender https://github.com/kube-ns-suspender/kube-ns-suspender?tab=readme-ov-file#scaledobjects "Unsuspending will remove this annotation." So our prime suspect it this piece of software interfering with Flagger --> Scaled Object

frankjkelly avatar May 02 '25 18:05 frankjkelly

FYI confirmed that the issue was with kube-ns-suspender preventing the canary from scaling down.

frankjkelly avatar May 12 '25 14:05 frankjkelly