OutOfSync when deployment has topologySpreadConstraints and resource limits
Checklist:
- [x] I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
- [x] I've included steps to reproduce the bug.
- [x] I've pasted the output of
argocd version.
Describe the bug
We noticed that deployments that are a part of the Application become OutOfSync if they have topologySpreadConstraints as part of specs . when looking at the diff pane it shows unrelated diff regarding the resource limits:
To Reproduce Create argo application that deploy the below manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: devops
name: petclinic-demo-app-1
labels:
app: petclinic-demo-app-1
spec:
replicas: 1
selector:
matchLabels:
app: petclinic-demo-app-1
template:
metadata:
labels:
app: petclinic-demo-app-1
spec:
topologySpreadConstraints:
- maxSkew: 1
minDomains: 1
topologyKey: "kubernetes.io/hostname"
whenUnsatisfiable: "DoNotSchedule"
labelSelector:
matchLabels:
app: petclinic-demo-app-1
matchLabelKeys:
- pod-template-hash
nodeAffinityPolicy: Honor
nodeTaintsPolicy: Ignore
containers:
- name: app
image: jbrisbin/spring-petclinic
env:
- name: "_JAVA_OPTIONS"
value: "-Xmx1G -Xms1G -XX:MaxDirectMemorySize=64m"
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 1000m
memory: 1Gi
And you should see app outofsync with a diff of limits. 2) Remove the topologySpreadConstraints from deployment :
topologySpreadConstraints:
- maxSkew: 1
minDomains: 1
topologyKey: "kubernetes.io/hostname"
whenUnsatisfiable: "DoNotSchedule"
labelSelector:
matchLabels:
app: petclinic-demo-app-1
matchLabelKeys:
- pod-template-hash
nodeAffinityPolicy: Honor
nodeTaintsPolicy: Ignore
And you should see that outofsync is no longer seen.
Expected behavior App should be synced properly in terms of resource limits regardless the existence of topologySpreadConstraints
Screenshots
Version
v2.9.1+58b04e5
Logs
Paste any relevant application logs here.
I have seen a similar issue in one environment where resource limits diffing, however, in another environment with an identical configuration there is no difference. The spec doesn't contain any extra fields like topologySpreadConstraints
resources:
limits:
cpu: '4'
memory: 2Gi
requests:
cpu: 500m
memory: 2Gi
I am getting the same issue.
If I remove matchLabelKeys from topologySpreadConstraints, it reverts back to the regular behaviour.
matchLabelKeys:
- pod-template-hash
@imeliran @ashutosh16 @rwong2888 what k8s version are u using? i wonder if this relates to https://github.com/argoproj/argo-cd/issues/15176
I am on 1.27 @tooptoop4. I suspect the same. Was discussing with @crenshaw-dev and maybe in 2.11 it will get resolved.
I was on Kubernetes 1.27, both matchLabelKeys and nodeTaintsPolicy is causing outofsync.
The diff is about empty or default fields like:
nodeSelector: {}
tolerations: []
hostNetwork: false
.etc.
Argo version 2.7.7
cc @rwong2888 @crenshaw-dev
I have seen a similar issue in one environment where resource limits diffing, however, in another environment with an identical configuration there is no difference. The spec doesn't contain any extra fields like
topologySpreadConstraintsresources: limits: cpu: '4' memory: 2Gi requests: cpu: 500m memory: 2Gi
We are also seeing this issue on one of our kubernetes clusters. Working fine on over environments with identical configuration though.
My team is also hitting this bug when including topologySpreadConstraints in the spec and using whole CPU values for requests/limits.
Also having this issue on v2.9.5+f943664, where the sync was happy when the desired manifest has resources.limits.cpu: 1000m and the live manifest has resources.limits.cpu: '1'. But something made it unhappy - not sure if it was the addition of the nodeTaintsPolicy to our topology spread constraints.
Same issue on 2.7.9. The issue is with any field that's still in beta, so that includes nodeAffinityPolicy, nodeTaintsPolicy, and matchLabelKeys.
Hopefully they resolve this soon!
Looks like the same issue persists on Kubernetes version 1.29 and argo v2.9.6 The flag matchLabelKeys is still in beta according to feature gates page.
Hitting a similar issue here with k8s 1.28 and argo v2.7.14.
Initially hit a problem with setting InitialDelaySeconds: 0 on probes and now hitting similar resource diffs
Can confirm I'm seeing this as well when adding topologySpreadConstraints to a deployment. Seeing both the InitialDelaySeconds: 0 issue as well as resource request/limits marshaling failures on values that worked previously.
ArgoCD v2.9.6 and K8s 1.28
Hello, I'm having the same issue when using topologySpreadConstraints all the apps go Out Of Sync ...
Fyi, I saw this fixed when I upgraded to 2.10.x, couldn't pinpoint the commit which fixed this. Maybe some library upgrade could've
Hi, we are facing the same issue in v2.11.4+e1284e1. We need to calculate the CPU limit based on the request, so we need integer. ArgoCD then always shows OutOfSync:
workaround
You can configure known types following https://argo-cd.readthedocs.io/en/stable/user-guide/diffing/#known-kubernetes-types-in-crds-resource-limits-volume-mounts-etc to deal with this. I'll close this, but feel free to re-open if this is not enough.
Following up on this, it appears my issue is resolved, at least in 2.13.0. Maybe, even earlier.
I'm encountering this in 2.14.2 with server side diff enabled. I managed work around this by adding the following to the ArgoCD config:
resource.customizations.knownTypeFields.apps_Deployment: |
- field: spec.template.spec.affinity
type: core/v1/Affinity
Not sure if this issue should be closed since there seems to be a configuration that still has this failure.
I experienced the same issue in version 2.9.1, but version 2.14.9 does not seem to have this problem.
