argo-rollouts icon indicating copy to clipboard operation
argo-rollouts copied to clipboard

Support maxSurge for blueGreen

Open sdlevi27 opened this issue 7 months ago • 3 comments

Summary

Add support to specify a maxSurge value when using a blueGreen deployment. It appears to only be supported in canary deployments.

Use Cases

When restarting a rollout, we are faced with 2 options - either restart one pod at a time (default maxUnavailable), or specify a maxUnavailable > 0. Restarting a pod one at a time can take a long time. If we specify a maxUnavailable value, we can restart more than one pod at a time, but we run the risk of dropping below our desired pod count during the recycle.

The preferred scenario would be to allow the ReplicaSet to scale beyond the current level, spin up new pods, then terminate the old pods once the new ones are Ready. This would allow a very quick restart while maintaining desired pod capacity.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

sdlevi27 avatar Dec 07 '23 19:12 sdlevi27

You can define a canary strategy that might replicate the desired behavior provided you have a Virtual Service that routes traffic to different Services based on match criteria.

spec:
  replicas: 10
  strategy:
    canary:
      canaryService: example-service-preview
      maxSurge: 1
      stableService: example-service
      steps:
      - setWeight: 1

zimmertr avatar Jan 04 '24 23:01 zimmertr

I agree that would be good to have. As we would need more like a rolling update behavior. So ramping up e.g. 50% new pods. Let them getting ready. Then terminating 50%. Then, doing the same for 2nd half. Never going below our required pod count.

vghero avatar Jan 09 '24 17:01 vghero

The BlueGreen strategy already creates the additional pods to run the newer version of the application. Once all the new pods are ready only then the service switches the traffic to the new pods. Till then it requires all the old pods to serve the traffic by the older version of the application.

If the traffic mixes between the newer version and the older version them it effectively becomes a canary deployment

skbar50 avatar Mar 26 '24 15:03 skbar50