flagger Canary for worker-style deployments

Describe the feature

Is there support for worker deployments? I.e. they don’t receive any traffic, but they pull work from somewhere on their own.

Proposed solution

One way to support it would be to allow user to control how quickly canary is spun up. For example, stead of shifting traffic X % at a time, scale up canary pods N pods at a time depending on how long the analysis period should be, and run metric checks all the way.

May 07 '22 05:05 naphatkrit

You could use a HPA that scales the deployment based on the queue size or something similar, then in Flagger your would use Blue/Green (since you there is no traffic shift needed) that checks the metrics.

May 09 '22 09:05 stefanprodan

That won’t let us control the amount of traffic that canary receives right? I was thinking the number of pods could be used in place of traffic shift %

May 09 '22 12:05 naphatkrit

That won’t let us control the amount of traffic that canary receives right?

HPA will increase the number of pods based on some external metric.

May 09 '22 12:05 stefanprodan

But how do you gradually shift up the number pods as a function of how long canary has been progressing? The thing I’m trying to emulate here is the gradual traffic shift possible with the canary strategy today.

May 09 '22 12:05 naphatkrit

Flagger does not deal with scaling, that role is for HPA. Flagger kicks off HPA by scaling the canary from zero to one replica, from there HPA takes the lead and scales the pods based on traffic and/or resource usage. In your case, you need some kind of metric that tells HPA to scale up while the analysis is running. I guess you can gradually push messages in a queue dedicated to the canary and HPA will scale based on the queue length or total count. You could use Flagger's webhooks, pre-rollout to trigger the queue push and post-rollout to stop pushing messages there.

May 09 '22 12:05 stefanprodan

Got it, thanks, I can look into it. It does feel like quite a bit of extra work to get sane rollout for the worker model (relative to the little work required for the server model)

May 09 '22 12:05 naphatkrit

While generating HTTP traffic is an easy job (flagger loadtester does this with hey), generating messages and pushing those to a queue is way more complicated, Flagger would need to integrate with Kafka, SQS, RabbitMQ, etc. Also in terms of scaling, Flagger would overlap with HPA and if you used them both they will cancel each other.

Maybe we could implement the gradual scaling in Flagger v2, but integrating will MQ systems is not something that I would consider in scope for Flagger.

May 09 '22 12:05 stefanprodan