flagger icon indicating copy to clipboard operation
flagger copied to clipboard

Flagger and Flux gitops workflow in the case of cluster rebuilds

Open spandan541 opened this issue 2 years ago • 4 comments

Hi Team,

I have tried looking for a similar issue addressing the same problem but could not find one.

Describe the bug

My team deploys all manifests using Flux and we don't prefer doing it through kubectl. We are keen on using Flagger for our canary releases along with Flux but we have faced a peculiar challenge as our clusters need to be rebuilt very often.

Steps

  1. Deployment in Gitops repo initially has image tag v1
  2. Canary CR is initialised in the cluster with manual gating enabled
  3. Deployment in Gitops repo is updated to image tag v2
  4. Flux reconciles and updates image tag v1 -> v2
  5. Flagger starts the promotion process. Say with manual gating enabled, the promotion process is waiting at a weight of 20.
  6. At this point due to some reason the cluster needs to be rebuilt
  7. When the cluster is rebuilt, flux applies deployment with image tag v2 even though promotion was not complete (This is the problem!)

Expected behavior

This causes the limitation where Flagger & Flux don't know that the promotion process was interrupted due to a cluster rebuild i.e. no state of the promotion process is saved.

Possible solutions :-

  • Provided 2 target deployments (primary and canary) under Canary.spec so no change in image tag is necessary
  • Flagger somehow saves the state of the promotion across rebuilds so even if Flux creates a Deployment with v2, zero traffic is sent to it

Please do let me know if there are better ways to solve this corner case. Any help would be greatly appreciated! Thanks in advance.

Additional context

  • Flagger version: 1.31.0
  • Kubernetes version: 1.27
  • Service Mesh provider: Istio

spandan541 avatar Jan 09 '24 16:01 spandan541

Any ideas anyone? @stefanprodan @aryan9600

spandan541 avatar Jan 29 '24 12:01 spandan541

You can use version 1.36.1 of Flagger to see if it solves this problem.

LiZhenCheng9527 avatar Mar 06 '24 02:03 LiZhenCheng9527

Unfortunately, that doesn't solve the problem

spandan541 avatar May 26 '24 14:05 spandan541