argo-rollouts
argo-rollouts copied to clipboard
Canary pod downtime when switch between setCanaryScale: replicas and matchTrafficWeight: true
Checklist:
- [ ] I've included steps to reproduce the bug.
- [ ] I've included the version of argo rollouts.
Describe the bug Hi team, I got the problem relating switch setCanaryScale between replicas and matchTrafficWeight: true , the below is my steps:
steps:
- setCanaryScale:
replicas: 1
- pause: {}
- setCanaryScale:
matchTrafficWeight: true
- setWeight: 10
- pause: { duration: 120s }
- setWeight: 15
...
The promote flow as below:
- there are 1 canary pod
- Pause
- switch to matchTrafficWeight: true , at here, old pod at step 1 immediately terminated and 1 new pod created. At this time, due to there are no canary pod alive to serve canary traffic so it was downtime (stable flow is still normal).
How can I prevent this case? Thanks all
To Reproduce
Expected behavior
When I promote to switch to setCanaryScale: matchTrafficWeight: true
, old canary pod keep alive till new canary pod created
Screenshots
Version v1.6.6
Logs
# Paste the logs from the rollout controller
# Logs for the entire controller:
kubectl logs -n argo-rollouts deployment/argo-rollouts
# Logs for a specific rollout:
kubectl logs -n argo-rollouts deployment/argo-rollouts | grep rollout=<ROLLOUTNAME
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
Hi @laivu266 you can add an initial setWeight=1 before using matchTrafficWeight=true. This will not delete the old canary pod.