argo-cd icon indicating copy to clipboard operation
argo-cd copied to clipboard

OutOfSync when deployment has topologySpreadConstraints and resource limits

Open imeliran opened this issue 2 years ago • 13 comments

Checklist:

  • [x] I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • [x] I've included steps to reproduce the bug.
  • [x] I've pasted the output of argocd version.

Describe the bug We noticed that deployments that are a part of the Application become OutOfSync if they have topologySpreadConstraints as part of specs . when looking at the diff pane it shows unrelated diff regarding the resource limits: image

To Reproduce Create argo application that deploy the below manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: devops
  name: petclinic-demo-app-1
  labels:
    app: petclinic-demo-app-1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: petclinic-demo-app-1
  template:
    metadata:
      labels:
        app: petclinic-demo-app-1
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          minDomains: 1
          topologyKey: "kubernetes.io/hostname"
          whenUnsatisfiable: "DoNotSchedule"
          labelSelector:
            matchLabels:
              app: petclinic-demo-app-1
          matchLabelKeys:
            - pod-template-hash
          nodeAffinityPolicy: Honor
          nodeTaintsPolicy: Ignore
      containers:
        - name: app
          image: jbrisbin/spring-petclinic
          env:
          - name: "_JAVA_OPTIONS"
            value: "-Xmx1G -Xms1G -XX:MaxDirectMemorySize=64m"
          resources:
            limits:
              cpu: 1000m
              memory: 1Gi
            requests:
              cpu: 1000m
              memory: 1Gi

And you should see app outofsync with a diff of limits. 2) Remove the topologySpreadConstraints from deployment :

      topologySpreadConstraints:
        - maxSkew: 1
          minDomains: 1
          topologyKey: "kubernetes.io/hostname"
          whenUnsatisfiable: "DoNotSchedule"
          labelSelector:
            matchLabels:
              app: petclinic-demo-app-1
          matchLabelKeys:
            - pod-template-hash
          nodeAffinityPolicy: Honor
          nodeTaintsPolicy: Ignore

And you should see that outofsync is no longer seen.

Expected behavior App should be synced properly in terms of resource limits regardless the existence of topologySpreadConstraints

Screenshots

Version

v2.9.1+58b04e5

Logs

Paste any relevant application logs here.

imeliran avatar Nov 20 '23 15:11 imeliran

I have seen a similar issue in one environment where resource limits diffing, however, in another environment with an identical configuration there is no difference. The spec doesn't contain any extra fields like topologySpreadConstraints

resources:
            limits:
              cpu: '4'
              memory: 2Gi
            requests:
              cpu: 500m
              memory: 2Gi

Screenshot 2024-01-08 at 2 34 45 PM

ashutosh16 avatar Jan 08 '24 22:01 ashutosh16

I am getting the same issue.

If I remove matchLabelKeys from topologySpreadConstraints, it reverts back to the regular behaviour.

matchLabelKeys:
- pod-template-hash

rwong2888 avatar Jan 11 '24 16:01 rwong2888

@imeliran @ashutosh16 @rwong2888 what k8s version are u using? i wonder if this relates to https://github.com/argoproj/argo-cd/issues/15176

tooptoop4 avatar Jan 27 '24 08:01 tooptoop4

I am on 1.27 @tooptoop4. I suspect the same. Was discussing with @crenshaw-dev and maybe in 2.11 it will get resolved.

rwong2888 avatar Jan 29 '24 15:01 rwong2888

I was on Kubernetes 1.27, both matchLabelKeys and nodeTaintsPolicy is causing outofsync. The diff is about empty or default fields like: nodeSelector: {} tolerations: [] hostNetwork: false .etc. Argo version 2.7.7 cc @rwong2888 @crenshaw-dev

tianqi-z avatar Feb 12 '24 00:02 tianqi-z

I have seen a similar issue in one environment where resource limits diffing, however, in another environment with an identical configuration there is no difference. The spec doesn't contain any extra fields like topologySpreadConstraints

resources:
            limits:
              cpu: '4'
              memory: 2Gi
            requests:
              cpu: 500m
              memory: 2Gi

Screenshot 2024-01-08 at 2 34 45 PM

We are also seeing this issue on one of our kubernetes clusters. Working fine on over environments with identical configuration though.

image

rob-whittle avatar Feb 21 '24 11:02 rob-whittle

My team is also hitting this bug when including topologySpreadConstraints in the spec and using whole CPU values for requests/limits.

adam-harries-hub avatar Feb 26 '24 11:02 adam-harries-hub

Also having this issue on v2.9.5+f943664, where the sync was happy when the desired manifest has resources.limits.cpu: 1000m and the live manifest has resources.limits.cpu: '1'. But something made it unhappy - not sure if it was the addition of the nodeTaintsPolicy to our topology spread constraints.

henryzhao95 avatar Mar 14 '24 02:03 henryzhao95

Same issue on 2.7.9. The issue is with any field that's still in beta, so that includes nodeAffinityPolicy, nodeTaintsPolicy, and matchLabelKeys.

Hopefully they resolve this soon!

emmahsax avatar Mar 29 '24 22:03 emmahsax

Looks like the same issue persists on Kubernetes version 1.29 and argo v2.9.6 The flag matchLabelKeys is still in beta according to feature gates page.

image

pankajkumar911 avatar Jun 26 '24 22:06 pankajkumar911

Hitting a similar issue here with k8s 1.28 and argo v2.7.14.

Initially hit a problem with setting InitialDelaySeconds: 0 on probes and now hitting similar resource diffs

image

easterbrennan avatar Jul 24 '24 14:07 easterbrennan

Can confirm I'm seeing this as well when adding topologySpreadConstraints to a deployment. Seeing both the InitialDelaySeconds: 0 issue as well as resource request/limits marshaling failures on values that worked previously.

ArgoCD v2.9.6 and K8s 1.28

RLProteus avatar Aug 13 '24 03:08 RLProteus

Hello, I'm having the same issue when using topologySpreadConstraints all the apps go Out Of Sync ...

SavaMihai avatar Aug 22 '24 00:08 SavaMihai

Fyi, I saw this fixed when I upgraded to 2.10.x, couldn't pinpoint the commit which fixed this. Maybe some library upgrade could've

ashinsabu3 avatar Sep 03 '24 13:09 ashinsabu3

Hi, we are facing the same issue in v2.11.4+e1284e1. We need to calculate the CPU limit based on the request, so we need integer. ArgoCD then always shows OutOfSync: image

benniwiit avatar Oct 08 '24 05:10 benniwiit

workaround image

jinleileiking avatar Oct 16 '24 03:10 jinleileiking

You can configure known types following https://argo-cd.readthedocs.io/en/stable/user-guide/diffing/#known-kubernetes-types-in-crds-resource-limits-volume-mounts-etc to deal with this. I'll close this, but feel free to re-open if this is not enough.

andrii-korotkov-verkada avatar Nov 17 '24 19:11 andrii-korotkov-verkada

Following up on this, it appears my issue is resolved, at least in 2.13.0. Maybe, even earlier.

rwong2888 avatar Nov 22 '24 16:11 rwong2888

I'm encountering this in 2.14.2 with server side diff enabled. I managed work around this by adding the following to the ArgoCD config:

      resource.customizations.knownTypeFields.apps_Deployment: |
        - field: spec.template.spec.affinity
          type: core/v1/Affinity

Not sure if this issue should be closed since there seems to be a configuration that still has this failure.

purkhusid avatar Feb 13 '25 19:02 purkhusid

I experienced the same issue in version 2.9.1, but version 2.14.9 does not seem to have this problem.

mliner avatar May 26 '25 06:05 mliner