Flagger and Flux gitops workflow in the case of cluster rebuilds
Hi Team,
I have tried looking for a similar issue addressing the same problem but could not find one.
Describe the bug
My team deploys all manifests using Flux and we don't prefer doing it through kubectl. We are keen on using Flagger for our canary releases along with Flux but we have faced a peculiar challenge as our clusters need to be rebuilt very often.
Steps
- Deployment in Gitops repo initially has image tag
v1 - Canary CR is initialised in the cluster with manual gating enabled
- Deployment in Gitops repo is updated to image tag
v2 - Flux reconciles and updates image tag
v1 -> v2 - Flagger starts the promotion process. Say with manual gating enabled, the promotion process is waiting at a weight of 20.
- At this point due to some reason the cluster needs to be rebuilt
- When the cluster is rebuilt, flux applies deployment with image tag
v2even though promotion was not complete (This is the problem!)
Expected behavior
This causes the limitation where Flagger & Flux don't know that the promotion process was interrupted due to a cluster rebuild i.e. no state of the promotion process is saved.
Possible solutions :-
- Provided 2 target deployments (primary and canary) under
Canary.specso no change in image tag is necessary - Flagger somehow saves the state of the promotion across rebuilds so even if Flux creates a Deployment with
v2, zero traffic is sent to it
Please do let me know if there are better ways to solve this corner case. Any help would be greatly appreciated! Thanks in advance.
Additional context
- Flagger version: 1.31.0
- Kubernetes version: 1.27
- Service Mesh provider: Istio
Any ideas anyone? @stefanprodan @aryan9600
You can use version 1.36.1 of Flagger to see if it solves this problem.
Unfortunately, that doesn't solve the problem