argo-rollouts
argo-rollouts copied to clipboard
Support maxSurge for blueGreen
Summary
Add support to specify a maxSurge value when using a blueGreen deployment. It appears to only be supported in canary deployments.
Use Cases
When restarting a rollout, we are faced with 2 options - either restart one pod at a time (default maxUnavailable), or specify a maxUnavailable > 0. Restarting a pod one at a time can take a long time. If we specify a maxUnavailable value, we can restart more than one pod at a time, but we run the risk of dropping below our desired pod count during the recycle.
The preferred scenario would be to allow the ReplicaSet to scale beyond the current level, spin up new pods, then terminate the old pods once the new ones are Ready. This would allow a very quick restart while maintaining desired pod capacity.
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
You can define a canary
strategy that might replicate the desired behavior provided you have a Virtual Service that routes traffic to different Services based on match criteria.
spec:
replicas: 10
strategy:
canary:
canaryService: example-service-preview
maxSurge: 1
stableService: example-service
steps:
- setWeight: 1
I agree that would be good to have. As we would need more like a rolling update behavior. So ramping up e.g. 50% new pods. Let them getting ready. Then terminating 50%. Then, doing the same for 2nd half. Never going below our required pod count.
The BlueGreen strategy already creates the additional pods to run the newer version of the application. Once all the new pods are ready only then the service switches the traffic to the new pods. Till then it requires all the old pods to serve the traffic by the older version of the application.
If the traffic mixes between the newer version and the older version them it effectively becomes a canary deployment