argo-rollouts icon indicating copy to clipboard operation
argo-rollouts copied to clipboard

Canary pod downtime when switch between setCanaryScale: replicas and matchTrafficWeight: true

Open laivu266 opened this issue 10 months ago • 1 comments

Checklist:

  • [ ] I've included steps to reproduce the bug.
  • [ ] I've included the version of argo rollouts.

Describe the bug Hi team, I got the problem relating switch setCanaryScale between replicas and matchTrafficWeight: true , the below is my steps:

steps:
    - setCanaryScale:
        replicas: 1
    - pause: {}
    - setCanaryScale:
        matchTrafficWeight: true
    - setWeight: 10
    - pause: { duration: 120s }
    - setWeight: 15
...

The promote flow as below:

  1. there are 1 canary pod
  2. Pause
  3. switch to matchTrafficWeight: true , at here, old pod at step 1 immediately terminated and 1 new pod created. At this time, due to there are no canary pod alive to serve canary traffic so it was downtime (stable flow is still normal).

How can I prevent this case? Thanks all

To Reproduce

Expected behavior When I promote to switch to setCanaryScale: matchTrafficWeight: true, old canary pod keep alive till new canary pod created

Screenshots

Version v1.6.6

Logs

# Paste the logs from the rollout controller

# Logs for the entire controller:
kubectl logs -n argo-rollouts deployment/argo-rollouts

# Logs for a specific rollout:
kubectl logs -n argo-rollouts deployment/argo-rollouts | grep rollout=<ROLLOUTNAME

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

laivu266 avatar Apr 04 '24 02:04 laivu266

Hi @laivu266 you can add an initial setWeight=1 before using matchTrafficWeight=true. This will not delete the old canary pod.

deathsurgeon1 avatar May 19 '24 10:05 deathsurgeon1