flagger icon indicating copy to clipboard operation
flagger copied to clipboard

Additional selectorLabels are not added to primary deployment

Open AliakseiVenski opened this issue 2 years ago • 9 comments

Describe the bug

I tried to use selectorLabels described there, but my additional selector labels are present only on original deployment {deploy} and not on primary {deploy}-primary created by flagger.

original deployment:

spec:
  replicas: 0
  selector:
    matchLabels:
      app.kubernetes.io/instance: serviceName
      app.kubernetes.io/name: serviceName

primary deployment

spec:
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: serviceName

flagger values.yaml

image:
  tag: 1.22.2
meshProvider: traefik
metricsServer: http://prometheus-operator-kube-p-prometheus.prometheus-operator:9090
resources:
  limits:
    cpu: 1000m
    memory: 512Mi
  requests:
    cpu: 20m
    memory: 64Mi
selectorLabels: "app.kubernetes.io/name,app.kubernetes.io/instance"

flagger container spec

spec:
      containers:
        - name: flagger
          image: ghcr.io/fluxcd/flagger:1.22.2
          command:
            - ./flagger
            - '-log-level=info'
            - '-mesh-provider=traefik'
            - >-
              -metrics-server=http://prometheus-operator-kube-p-prometheus.prometheus-operator:9090
            - '-selector-labels=app.kubernetes.io/name,app.kubernetes.io/instance'
            - '-enable-config-tracking=true'
            - '-slack-user=flagger'

P.S.: I'm using helm chart for this deployment

To Reproduce

  • Instruct flagger to take into consideration additional label selectors in values.yaml
  • Add additional label that is enabled at flagger to your deployment (Helm)

Expected behavior

All labels that specified to at 'selectorLabels' are copied to primary from original deployment after successful promotion

Additional context

  • Flagger version: 1.22.2
  • Kubernetes version: 1.24.3
  • Service Mesh provider: -
  • Ingress provider: Traefik

AliakseiVenski avatar Nov 09 '22 14:11 AliakseiVenski

I noticed one more thing - after unsuccessful promotion deployment rolled back, then after 5-10 seconds I see 'New revision detected' again and flagger tries to promote failed deployment second time.

AliakseiVenski avatar Nov 09 '22 14:11 AliakseiVenski

@aryan9600 could you help please? I looked at https://github.com/fluxcd/flagger/issues/1227 and still do not know how to resolve this.

AliakseiVenski avatar Nov 29 '22 15:11 AliakseiVenski

I left just 'app.kubernetes.io/instance' as single selector label, now all services have this label as a selector, so I do everything correctly. But if I specify more than one label - rest labels except first in a row are not considered by flagger.

AliakseiVenski avatar Nov 29 '22 15:11 AliakseiVenski

Regarding deployment selector labels - they are changing only if you delete and then install helm release, so flagger operator doesn't modify primary deployment selector labels on the fly (works with service selector label). Already 2 bugs there in one post :(

AliakseiVenski avatar Nov 29 '22 15:11 AliakseiVenski

I'm observing the same issue. We are running flagger 1.27

In the helm chart I've set following values: selectorLabels: "app.kubernetes.io/name,app.kubernetes.io/instance"

On the initial deployment we have following labels:

spec:                                             
  selector:                                       
    matchLabels:                                  
      app.kubernetes.io/instance: xxx             
      app.kubernetes.io/name: xxx                 
  template:                                       
    metadata:                                     
      labels:                                     
        app.kubernetes.io/instance: xxx           
        app.kubernetes.io/name: xxx               
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app.kubernetes.io/instance: xxx
                app.kubernetes.io/name: xxx
            topologyKey: kubernetes.io/hostname
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/instance: xxx
            app.kubernetes.io/name: xxx
        maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway

On the primary deployment, not all template labels are updated. This is causing that affinity and topology spread constrain rules do not work at all due mismatch in labels.

spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: xxx-primary
  template:
    metadata:
      labels:
        app.kubernetes.io/instance: xxx             <- missing primary
        app.kubernetes.io/name: xxx-primary
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app.kubernetes.io/instance: xxx-primary
                app.kubernetes.io/name: xxx-primary
            topologyKey: kubernetes.io/hostname
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/instance: xxx-primary
            app.kubernetes.io/name: xxx-primary
        maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway

From my tests only the first label match in selectorLabels parameter is updated in templates.metadata

jkotiuk avatar Mar 01 '23 15:03 jkotiuk

Thats exactly what the code does https://github.com/fluxcd/flagger/blob/main/pkg/canary/daemonset_controller.go#L314 This completely breaks the deployment for us :(

pinkavaj avatar Oct 09 '23 09:10 pinkavaj

Hey @aryan9600, any comments on this? You see that users face this issue having an impact from small to high? Almost 1 year after issue is created and no feedback at all.

AliakseiVenski avatar Oct 09 '23 11:10 AliakseiVenski

P.S.: recently we updated flagger to latest, issue still persists.

AliakseiVenski avatar Oct 09 '23 11:10 AliakseiVenski

Bumping this thread. Just ran into this problem while trying to implement safe deployments for an event-driven application using Dapr. Without getting into too many details, Dapr creates a <app-name>-dapr network service to allow for service discovery as part of the features it provides. When the canary gets promoted and the labels are not copied over correctly, the Dapr network service loses track of the <app-name>-primary pods and the entire system stops working.

This is a total blocker for me :(

KrylixZA avatar May 21 '24 16:05 KrylixZA