flagger icon indicating copy to clipboard operation
flagger copied to clipboard

Flagger doesn't comply with `stepWeightPromotion` when restarting a Progressing canary analysis

Open nishaad78 opened this issue 5 years ago • 1 comments

When restarting a Progressing canary analysis (due to a change), flagger is switching back all traffic to primary replica. This can be a problem if the primary replica has scaled down (via hpa) during the canary analysis phase and will receive a burst in traffic it's not able to serve.

Shouldn't flagger use the stepWeightPromotion config when reverting traffic back to primary? This issue is related to #381

This is observed from flagger logs:

{"level":"info","ts":"2020-06-23T10:55:11.624Z","caller":"controller/events.go:16","msg":"Advance podinfo.nishaad-test canary weight 70","canary":"podinfo.nishaad-test"}
{"level":"info","ts":"2020-06-23T10:56:11.590Z","caller":"controller/events.go:28","msg":"podinfo-primary.nishaad-test not ready: waiting for rollout to finish: 2 of 3 updated replicas are available","canary":"podinfo.nishaad-test"}
{"level":"info","ts":"2020-06-23T10:57:11.590Z","caller":"controller/events.go:16","msg":"New revision detected! Restarting analysis for podinfo.nishaad-test","canary":"podinfo.nishaad-test"}
{"level":"info","ts":"2020-06-23T10:58:11.580Z","caller":"controller/events.go:28","msg":"canary deployment podinfo.nishaad-test not ready: waiting for rollout to finish: 2 of 3 updated replicas are available","canary":"podinfo.nishaad-test"}
{"level":"info","ts":"2020-06-23T10:59:11.574Z","caller":"controller/events.go:16","msg":"Starting canary analysis for podinfo.nishaad-test","canary":"podinfo.nishaad-test"}
{"level":"info","ts":"2020-06-23T10:59:11.591Z","caller":"controller/events.go:16","msg":"Advance podinfo.nishaad-test canary weight 10","canary":"podinfo.nishaad-test"}

nishaad78 avatar Jun 23 '20 11:06 nishaad78

I think that this is also happening when a new canary is created and there is already an existing deployment. Rather than progressively moving traffic from the old deployment (now the canary) to the new primary deployment, it's immediately moving all traffic.

nickcaballero avatar Feb 15 '23 22:02 nickcaballero