flagger icon indicating copy to clipboard operation
flagger copied to clipboard

Blue/Green deployment returning both Blue and Green during pod rollout

Open nothsa opened this issue 3 years ago • 0 comments

Describe the bug

While the pod rollout is happening on a Blue/Green deployment, the traffic flip-flops between the old and new versions. I believe this is because all the traffic is not being routed to the Canary/Green pods while the Primary pods are being rolled out, so we get a mix of responses from old and new Primary pods.

Config:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: test-app
  namespace: test-flagger-deployment
spec:
  analysis:
    interval: 60s
    iterations: 1
    threshold: 3
    match:
      - headers:
          Cookie:
            regex: \bX-ENV\s*=\s*canary\b
    metrics:
      - interval: 15s
        name: request-success-rate
        thresholdRange:
          min: 99
      - interval: 15s
        name: request-duration
        thresholdRange:
          max: 500
  autoscalerRef:
    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    name: test-app
  ingressRef:
    apiVersion: extensions/v1beta1
    kind: Ingress
    name: test-app
  progressDeadlineSeconds: 60
  provider: nginx
  service:
    name: test-app
    port: 80
    targetPort: http
  skipAnalysis: false
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: test-app

To Reproduce

Perform a Blue/Green deployment and continuously retrieve the results from the endpoint.

Bash script that I used:

#!/bin/bash
while [ true ]; do
  curl -s https://example-domain.com/version.txt
done

Results:

...
22.43.52
23.14.03
23.14.03
23.14.03
22.43.52
23.14.03
23.14.03
23.14.03
22.43.52
23.14.03
22.43.52
22.43.52
23.14.03
23.14.03
23.14.03
23.14.03
23.14.03
22.43.52
23.14.03
22.43.52
23.14.03
23.14.03
22.43.52
23.14.03
...

Expected behavior

According to the docs:

After the analysis finishes, the traffic is routed to the canary (green) before triggering the primary (blue) rolling update, this ensures a smooth transition to the new version avoiding dropping in-flight requests during the Kubernetes deployment rollout.

I would expect that when the rollout begins, it would not flip-flop between the old and new versions. It would show the old version until the rollout begins, and then it would show the new version from that point onwards.

Additional context

  • Flagger version: 1.6.0
  • Kubernetes version: 1.21.9
  • Service Mesh provider: n/a
  • Ingress provider: Nginx

nothsa avatar May 12 '22 01:05 nothsa