flagger
flagger copied to clipboard
Additional selectorLabels are not added to primary deployment
Describe the bug
I tried to use selectorLabels described there, but my additional selector labels are present only on original deployment {deploy} and not on primary {deploy}-primary created by flagger.
original deployment:
spec:
replicas: 0
selector:
matchLabels:
app.kubernetes.io/instance: serviceName
app.kubernetes.io/name: serviceName
primary deployment
spec:
replicas: 2
selector:
matchLabels:
app.kubernetes.io/name: serviceName
flagger values.yaml
image:
tag: 1.22.2
meshProvider: traefik
metricsServer: http://prometheus-operator-kube-p-prometheus.prometheus-operator:9090
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 20m
memory: 64Mi
selectorLabels: "app.kubernetes.io/name,app.kubernetes.io/instance"
flagger container spec
spec:
containers:
- name: flagger
image: ghcr.io/fluxcd/flagger:1.22.2
command:
- ./flagger
- '-log-level=info'
- '-mesh-provider=traefik'
- >-
-metrics-server=http://prometheus-operator-kube-p-prometheus.prometheus-operator:9090
- '-selector-labels=app.kubernetes.io/name,app.kubernetes.io/instance'
- '-enable-config-tracking=true'
- '-slack-user=flagger'
P.S.: I'm using helm chart for this deployment
To Reproduce
- Instruct flagger to take into consideration additional label selectors in values.yaml
- Add additional label that is enabled at flagger to your deployment (Helm)
Expected behavior
All labels that specified to at 'selectorLabels' are copied to primary from original deployment after successful promotion
Additional context
- Flagger version: 1.22.2
- Kubernetes version: 1.24.3
- Service Mesh provider: -
- Ingress provider: Traefik
I noticed one more thing - after unsuccessful promotion deployment rolled back, then after 5-10 seconds I see 'New revision detected' again and flagger tries to promote failed deployment second time.
@aryan9600 could you help please? I looked at https://github.com/fluxcd/flagger/issues/1227 and still do not know how to resolve this.
I left just 'app.kubernetes.io/instance' as single selector label, now all services have this label as a selector, so I do everything correctly. But if I specify more than one label - rest labels except first in a row are not considered by flagger.
Regarding deployment selector labels - they are changing only if you delete and then install helm release, so flagger operator doesn't modify primary deployment selector labels on the fly (works with service selector label). Already 2 bugs there in one post :(
I'm observing the same issue. We are running flagger 1.27
In the helm chart I've set following values:
selectorLabels: "app.kubernetes.io/name,app.kubernetes.io/instance"
On the initial deployment we have following labels:
spec:
selector:
matchLabels:
app.kubernetes.io/instance: xxx
app.kubernetes.io/name: xxx
template:
metadata:
labels:
app.kubernetes.io/instance: xxx
app.kubernetes.io/name: xxx
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/instance: xxx
app.kubernetes.io/name: xxx
topologyKey: kubernetes.io/hostname
topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/instance: xxx
app.kubernetes.io/name: xxx
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
On the primary deployment, not all template labels are updated. This is causing that affinity and topology spread constrain rules do not work at all due mismatch in labels.
spec:
selector:
matchLabels:
app.kubernetes.io/name: xxx-primary
template:
metadata:
labels:
app.kubernetes.io/instance: xxx <- missing primary
app.kubernetes.io/name: xxx-primary
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/instance: xxx-primary
app.kubernetes.io/name: xxx-primary
topologyKey: kubernetes.io/hostname
topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/instance: xxx-primary
app.kubernetes.io/name: xxx-primary
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
From my tests only the first label match in selectorLabels
parameter is updated in templates.metadata
Thats exactly what the code does https://github.com/fluxcd/flagger/blob/main/pkg/canary/daemonset_controller.go#L314 This completely breaks the deployment for us :(
Hey @aryan9600, any comments on this? You see that users face this issue having an impact from small to high? Almost 1 year after issue is created and no feedback at all.
P.S.: recently we updated flagger to latest, issue still persists.
Bumping this thread. Just ran into this problem while trying to implement safe deployments for an event-driven application using Dapr. Without getting into too many details, Dapr creates a <app-name>-dapr
network service to allow for service discovery as part of the features it provides. When the canary gets promoted and the labels are not copied over correctly, the Dapr network service loses track of the <app-name>-primary
pods and the entire system stops working.
This is a total blocker for me :(