argo-rollouts Canary deployment causing 503s after reaching 100% setWeight

Canary deployment causing 503s after reaching 100% setWeight

Open deathsurgeon1 opened this issue 9 months ago • 1 comments

Checklist:

[ ] I've included steps to reproduce the bug.
[ ] I've included the version of argo rollouts.

Describe the bug

I am using below deployment configuration for canary deployment in our service. The deployment works fine till the last set of canary pods are marked as healthy. I am also using dynamicStableScale = true so as soon as canary pods comes up the required number of pods are getting removed from stable deployment which is our requirement.

However as soon as last batch of canary pods come up, the deployment starts deleting the last set of remaining pods in stable rollout. At this point of time, i am seeing 503s at this exact moment and once pods are deleted 503s are gone. Sharing the deployment configuration below:

kind: Rollout apiVersion: argoproj.io/v1alpha1 metadata: labels: env: prf-use2-eks name: rollout spec: replicas: null strategy: canary: dynamicStableScale: true antiAffinity: requiredDuringSchedulingIgnoredDuringExecution: { } canaryMetadata: labels: state: canary maxSurge: 10 maxUnavailable: 0 steps: - setCanaryScale: replicas: 3 - pause: {} - setWeight: 15 - pause: duration: 60s - setCanaryScale: matchTrafficWeight: true - setWeight: 25 - pause: duration: 60s - setWeight: 35 - pause: duration: 120s - setWeight: 50 - pause: duration: 120s - setWeight: 75 - pause: duration: 120s - setWeight: 100 - pause: duration: 240s trafficRouting: alb: ingress: rpas-rollout-ingress rootService: rpas-root-service servicePort: 443 istio: virtualService: name: rpas-rollout-vsvc

To Reproduce

Expected behavior

Screenshots

Version

Logs

# Paste the logs from the rollout controller

# Logs for the entire controller:
kubectl logs -n argo-rollouts deployment/argo-rollouts

# Logs for a specific rollout:
kubectl logs -n argo-rollouts deployment/argo-rollouts | grep rollout=<ROLLOUTNAME

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

May 19 '24 10:05 deathsurgeon1

argo-rollouts argo-rollouts copied to clipboard

Canary deployment causing 503s after reaching 100% setWeight

argo-rollouts
argo-rollouts copied to clipboard