argo-rollouts
argo-rollouts copied to clipboard
Canary deployment causing 503s after reaching 100% setWeight
Checklist:
- [ ] I've included steps to reproduce the bug.
- [ ] I've included the version of argo rollouts.
Describe the bug
I am using below deployment configuration for canary deployment in our service. The deployment works fine till the last set of canary pods are marked as healthy. I am also using dynamicStableScale = true so as soon as canary pods comes up the required number of pods are getting removed from stable deployment which is our requirement.
However as soon as last batch of canary pods come up, the deployment starts deleting the last set of remaining pods in stable rollout. At this point of time, i am seeing 503s at this exact moment and once pods are deleted 503s are gone. Sharing the deployment configuration below:
kind: Rollout apiVersion: argoproj.io/v1alpha1 metadata: labels: env: prf-use2-eks name: rollout spec: replicas: null strategy: canary: dynamicStableScale: true antiAffinity: requiredDuringSchedulingIgnoredDuringExecution: { } canaryMetadata: labels: state: canary maxSurge: 10 maxUnavailable: 0 steps: - setCanaryScale: replicas: 3 - pause: {} - setWeight: 15 - pause: duration: 60s - setCanaryScale: matchTrafficWeight: true - setWeight: 25 - pause: duration: 60s - setWeight: 35 - pause: duration: 120s - setWeight: 50 - pause: duration: 120s - setWeight: 75 - pause: duration: 120s - setWeight: 100 - pause: duration: 240s trafficRouting: alb: ingress: rpas-rollout-ingress rootService: rpas-root-service servicePort: 443 istio: virtualService: name: rpas-rollout-vsvc
To Reproduce
Expected behavior
Screenshots
Version
Logs
# Paste the logs from the rollout controller
# Logs for the entire controller:
kubectl logs -n argo-rollouts deployment/argo-rollouts
# Logs for a specific rollout:
kubectl logs -n argo-rollouts deployment/argo-rollouts | grep rollout=<ROLLOUTNAME
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.