argo-cd icon indicating copy to clipboard operation
argo-cd copied to clipboard

Does not work ignoreResourceUpdates

Open freedbka opened this issue 1 year ago • 15 comments

Checklist:

  • [+] I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • [+] I've included steps to reproduce the bug.
  • [+] I've pasted the output of argocd version.

Describe the bug Hello! I have a problem with a Reconciliation loop, due to the fact that some resources are constantly changing. Requesting app refresh caused by object update Following your instructions, I found out which resource allows constant updates. https://argo-cd.readthedocs.io/en/stable/operator-manual/reconcile/#finding-resources-to-ignore This is config-map kops-controller-leader in namespace kube-system its metadata is constantly changing

 control-plane.alpha.kubernetes.io/leader: >-
      {"holderIdentity":"ip-**-**-**-**_*********************","leaseDurationSeconds":15,"acquireTime":"2023-08-29T09:27:07Z","renewTime":"2023-09-20T12:56:56Z","leaderTransitions":0}

which leads to a refresh of all argocd applications approximately every second I tried adding exceptions to argocd-cm as in the documentation but it still generates millions of updates per day https://argo-cd.readthedocs.io/en/stable/operator-manual/argocd-cm-yaml/

  resource.customizations.ignoreDifferences.all: |
    jqPathExpressions:
    - '.metadata.annotations."control-plane.alpha.kubernetes.io/leader"'
    - .metadata.resourceVersion
    managedFieldsManagers:
    - kube-controller-manager
    - external-secrets
    jsonPointers:
    - /spec/replicas
    - /metadata/resourceVersion
    - /metadata/annotations/control-plane.alpha.kubernetes.io~1leader
  resource.customizations.ignoreResourceUpdates._ConfigMap: |
    jqPathExpressions:
    - '.metadata.annotations."control-plane.alpha.kubernetes.io/leader"'
    - .metadata.resourceVersion
  resource.customizations.ignoreResourceUpdates.all: |
    jqPathExpressions:
    - '.metadata.annotations."control-plane.alpha.kubernetes.io/leader"'
    - .metadata.resourceVersion
    jsonPointers:
    - /status
    - /metadata/resourceVersion
    - /metadata/annotations/control-plane.alpha.kubernetes.io~1leader
  resource.ignoreResourceUpdatesEnabled: 'true'

Screenshots image

Version

v.2.8.3

freedbka avatar Sep 20 '23 14:09 freedbka

Same here, it slow down all argocd operations. The weird thing is the config-map is not tracked by argo-cd, it is created by a controller, so I don't understand why argocd watch it. Maybe because i activated orphaned resources in projects....

duizabojul avatar Sep 21 '23 00:09 duizabojul

Had the same problem, completely removing orphanedResources option from the main AppProject helped image

kollad avatar Sep 21 '23 05:09 kollad

@kollad It worked. Thank you!

freedbka avatar Sep 21 '23 09:09 freedbka

I don't think this should be closed, this behavior is still a bug.

duizabojul avatar Sep 21 '23 09:09 duizabojul

@duizabojul Ok I will reopen

freedbka avatar Sep 21 '23 09:09 freedbka

Had the same problem, completely removing orphanedResources option from the main AppProject helped image

@kollad Not getting what needs to change exactly. Can you please elaborate what exactly need to change?

Sathish-rafay avatar Nov 29 '23 11:11 Sathish-rafay

We have the same situation with one Elasticsearch operator ConfigMap for leader election. Also EndpointSlice's are creating a lot of reconciliations.

We tried to ignore the updates as on the configmap the annotation for the leader is updating as well as the resourceVersion

apiVersion: v1
kind: ConfigMap
metadata:
  annotations:
    control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"elastic-operator-0_098489e3-3b66-4d7b-b17e-ac1555175d69","leaseDurationSeconds":15,"acquireTime":"2023-12-28T16:08:12Z","renewTime":"2024-01-05T11:53:17Z","leaderTransitions":179}'
  creationTimestamp: "2022-06-10T11:38:35Z"
  name: elastic-operator-leader
  namespace: elastic-operator
  resourceVersion: "815237253"
  uid: c876a2c1-efd0-4902-970c-02d4e2531b81

on the EndpointSlice's the annotation renewTime and also the resourceVersion are constantly changing

--- /tmp/first.yaml     2024-01-05 13:07:44.109295084 +0100
+++ /tmp/second.yaml    2024-01-05 13:07:54.485291050 +0100
@@ -19,7 +19,7 @@
     acquireTime: "2023-06-14T09:09:12.099141+00:00"
     leader: some-service-name-0
     optime: "4445962240"
-    renewTime: "2024-01-05T12:07:36.500017+00:00"
+    renewTime: "2024-01-05T12:07:46.499734+00:00"
     transitions: "3"
     ttl: "30"
   creationTimestamp: "2023-06-14T09:09:13Z"
@@ -42,7 +42,7 @@
     kind: Endpoints
     name: some-service-name
     uid: cc85b6fa-178d-4364-aa52-cb270b8ef44d
-  resourceVersion: "815254315"
+  resourceVersion: "815254479"
   uid: 56c48541-440c-4f49-9783-8e5c5338e72d
 ports:
 - name: postgresql

(side node, the EndpointSlice is probably anyways to be ignored but see https://github.com/argoproj/gitops-engine/pull/469)

We tried to ignore these updates with this config:

# SEE:
#  documentation: https://argo-cd.readthedocs.io/en/release-2.8/operator-manual/reconcile/
#  example config: https://argo-cd.readthedocs.io/en/stable/operator-manual/argocd-cm-yaml/
resource.ignoreResourceUpdatesEnabled: "true"
resource.customizations.ignoreResourceUpdates.all: |
  jsonPointers:
    - /metadata/resourceVersion
resource.customizations.ignoreResourceUpdates.ConfigMap: |
  jqPathExpressions:
    # ElasticOperator is updating this around 2 times per second
    - '.metadata.annotations."control-plane.alpha.kubernetes.io/leader"'
resource.customizations.ignoreResourceUpdates.discovery.k8s.io_EndpointSlice: |
  jsonPointers:
    # EndpointSlices should be ignored completely as Endpoints are already 
    # (see: https://github.com/argoproj/gitops-engine/pull/469) so until this is
    # done automatically the ignorance of `/metadata/resourceVersion` for all resources
    # plus ignoring this annotation should reduce the amount of updates significantly
    - /metadata/annotations/renewTime

so either our config is not correct or the feature is not working on these resources.. they are both "orphaned" resources, so maybe the feature actually doesn't work on non-managed-resources?

@Sathish-rafay I think hat @kollad meant is https://argo-cd.readthedocs.io/en/stable/user-guide/orphaned-resources/ so removing the setting altogether helped him, as most updates come from these non-managed resources which update constantly image

savar avatar Jan 05 '24 12:01 savar

so either our config is not correct or the feature is not working on these resources.. they are both "orphaned" resources, so maybe the feature actually doesn't work on non-managed-resources?

Just checked, the ConfigMap is an "orphanedResource" but the EndpointSlice is supposedly not. But I guess the latter is tracked via the OwnerReference to the Endpoint which is in theory also not managed by ArgoCD but I bet (but don't know) that ArgoCD knows that the managed Service will create an Endpoint and automatically tracks that as "this is managed".

But indepedently if ArgoCD things the EndpointSlice is managed or not, the updates aren't ignored (at least that's what we saw in the debug logs on the application controller pod).

savar avatar Jan 05 '24 12:01 savar

Same for me ignoreResourceUpdates do not work on Orphaned Resources

  resource.customizations.ignoreResourceUpdates.autoscaling.k8s.io_VerticalPodAutoscalerCheckpoint: |
    jsonPointers:
    - /status
  resource.ignoreResourceUpdatesEnabled: 'true' 

I still see requesting app refresh after updated the configmap :

{"api-version":"autoscaling.k8s.io/v1","application":"argocd/poc-idp","cluster-name":"anthos-test-nprd","fields.level":1,"kind":"VerticalPodAutoscalerCheckpoint","level":"debug","msg":"Requesting app refresh caused by object update","name":"poc-idp-wordpress","namespace":"poc-idp","server":"https://XXXXX.central-1.amazonaws.com","time":"2024-02-06T17:11:33Z"}

mick1627 avatar Feb 06 '24 17:02 mick1627

So - I'm curious about this one... I understand how to ignore individual field updates on certain types of objects, but we operate a very fast-moving kubernetes cluster that launches between 200 and 400k pods daily. When we look at the log entries for "Requesting app refresh caused by object update" - we can see that we are getting 25 new pod updates per second.

image

Is there some way to make ArgoCD ignore Pod/EndpointSlice changes for the purpose of manifest comparison?

diranged avatar Feb 09 '24 22:02 diranged

We are experiencing the same with resources managed by an operator

giepa avatar Feb 16 '24 09:02 giepa

Is there some way to make ArgoCD ignore Pod/EndpointSlice changes for the purpose of manifest comparison?

Just out of curiosity (for EndpointSlice's): did you try to do the two things?

  1. disable orphanedResources on your AppProject's
  2. ignore EndpointSlice's, like as an example:
resource.customizations.ignoreResourceUpdates.discovery.k8s.io_EndpointSlice: |
  jsonPointers:
    - /metadata/annotations/renewTime
    - /metadata/resourceVersion

I am not sure if this will help in ignoring newly created things by an HPA but it would be interesting if it reduces it somehow.

savar avatar Feb 17 '24 09:02 savar

We have the same issue. ArgoCD Events App is creating the following configMap:

apiVersion: v1
kind: ConfigMap
metadata:
  annotations:
    control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"controller-manager-c8d4c76d-f6x4w_3c1ec9c1-9b77-43bb-a943-5b26247a33b6","leaseDurationSeconds":15,"acquireTime":"2024-03-03T19:44:57Z","renewTime":"2024-03-03T20:34:57Z","leaderTransitions":1}'
  creationTimestamp: "2024-03-03T19:44:38Z"
  name: argo-events-controller
  namespace: argo-events
  resourceVersion: "88954276"
  uid: 6a4dfeca-a51d-406a-a26b-486b3539313e

The renewTime of metadata.annotations."control-plane.alpha.kubernetes.io/leader" and the 'resourceVersion' always changes every few seconds.

As long as we have the following AppProject config, we have a reconciliation loop of ArgoEvents every few seconds:

  orphanedResources:
    warn: false

This leads to higher CPU usage of the Argocd-Application-Controller.

Also tried the "resource.ignoreResourceUpdates" inside the argocd-cm without any success (https://github.com/argoproj/argo-cd/issues/15594#issuecomment-1878577773)

Following the Debug-Log which shows that the reconciliation is triggered by this ConfigMap "argo-events-controller" from namespace argo-events:

argocd-application-controller-0 argocd-application-controller time="2024-03-03T20:40:33Z" level=debug msg="Checking if cluster https://kubernetes.default.svc with clusterShard 0 should be processed by shard 0"
argocd-application-controller-0 argocd-application-controller time="2024-03-03T20:40:33Z" level=debug msg="Requesting app refresh caused by object update" api-version=v1 application=argocd/argo-events cluster-name= fields.level=1 kind=ConfigMap name=argo-events-controller namespace=argo-events server="https://kubernetes.default.svc"
argocd-application-controller-0 argocd-application-controller time="2024-03-03T20:40:33Z" level=info msg="Refreshing app status (controller refresh requested), level (1)" application=argocd/argo-events
argocd-application-controller-0 argocd-application-controller time="2024-03-03T20:40:33Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: argo-events)" application=argocd/argo-events
argocd-application-controller-0 argocd-application-controller time="2024-03-03T20:40:33Z" level=info msg="No status changes. Skipping patch" application=argocd/argo-events
argocd-application-controller-0 argocd-application-controller time="2024-03-03T20:40:33Z" level=info msg="Reconciliation completed" application=argocd/argo-events dedup_ms=0 dest-name= dest-namespace=argo-events dest-server="https://kubernetes.default.svc" diff_ms=1 fields.level=1 git_ms=20 health_ms=1 live_ms=1 patch_ms=0 setop_ms=0 settings_ms=0 sync_ms=0 time_ms=43

Removing the orphanedResources inside the AppProject is a workaround, but I am surprised why orphaned resources trigger a reconciliation. It looks like a bug to me.

husira avatar Mar 03 '24 20:03 husira

I ended up completely excluding VPA with resource.exclusions

mtrin avatar Mar 25 '24 04:03 mtrin

Same problem for me with argo-cd and istio which maintains several configmaps with control-plane.alpha.kubernetes.io/leader settings. The ignoreResourceUpdates definition from https://github.com/argoproj/argo-cd/issues/15594#issue-1905156183 worked for me only after removing all orphanedResources from my projects and restarting the application-controller.

So why is orphanedResources interfering with these ignoreResourceUpdates definitions?

ptr1120 avatar May 20 '24 16:05 ptr1120

So this is really interesting to me - this issue is really old, and still happening. We just noticed that even though we have the /status field ignored on all of our resources, we still see every few seconds a HorizontalPodAutoscaler object update triggers a reconciliation:

  resource.customizations.ignoreResourceUpdates.all: |
    jsonPointers:
      - /status 
time="2024-06-14T15:42:35Z" level=debug msg="Requesting app refresh caused by object update" api-version=autoscaling/v2 application=argocd-system/... cluster-name= fields.level=0 kind=HorizontalPodAutoscaler name=.... namespace=otel server="https://kubernetes.default.svc"

Using kubectl-grep we watch the HPA object and the diff's are all in the supposedly ignored fields:

  apiVersion: "autoscaling/v2"
  kind: "HorizontalPodAutoscaler"
  metadata:
    creationTimestamp: "2024-06-03T02:55:49Z"
    name: "otel-collector-metrics-processor-collector"
    namespace: "otel"
    ownerReferences:
      -
        apiVersion: "opentelemetry.io/v1beta1"
        blockOwnerDeletion: true
        controller: true
        kind: "OpenTelemetryCollector"
        name: "...."
        uid: "f131b749-c70a-4fc9-a4e2-21aea2023410"
-   resourceVersion: "221899671"
+   resourceVersion: "221900017"
    uid: "a5432460-837e-4a89-85dd-1177034cf993"
  spec:
...
  status:
    conditions:
      -
        lastTransitionTime: "2024-06-03T02:56:04Z"
        message: "recommended size matches current size"
        reason: "ReadyForNewScale"
        status: "True"
        type: "AbleToScale"
      -
        lastTransitionTime: "2024-06-13T02:17:45Z"
        message: "the desired replica count is less than the minimum replica count"
        reason: "TooFewReplicas"
        status: "True"
        type: "ScalingLimited"
      -
        lastTransitionTime: "2024-06-11T08:44:44Z"
-       message: "the HPA was able to successfully calculate a replica count from memory resource utilization (percentage of request)"
+       message: "the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)"
        reason: "ValidMetricFound"
        status: "True"
        type: "ScalingActive"
    currentMetrics:
      -
        resource:
          current:
-           averageUtilization: 18
+           averageUtilization: 21
-           averageValue: "376m"
+           averageValue: "425m"
          name: "cpu"
        type: "Resource"
      -
        resource:
          current:
-           averageUtilization: 12
+           averageUtilization: 11
-           averageValue: "394474837333m"
+           averageValue: "367874048"
          name: "memory"
        type: "Resource"
    currentReplicas: 3
    desiredReplicas: 3
    lastScaleTime: "2024-06-09T23:33:28Z"

The above update should not be triggering a reconciliation because it's only updating the /status and /metadata/resourceVersion fields. Our configuration explicitly ignores /status and according to the docs the other field should be ignored too:

By default, the metadata fields generation, resourceVersion and managedFields are always ignored for all resources.

diranged avatar Jun 14 '24 15:06 diranged

Following up on this - we see the same update behavior for all DaemonSets... any time a new pod is started, the /status field is updated... these shoudl be ignored, but they aren't and it triggers the app to be updated.

diranged avatar Jun 14 '24 15:06 diranged

Looking at the code and from some experiments, it seems that this configuration only works for objects that are directly managed by argocd (applied to to the cluster from the manifest). This configuration doesn't work for objects that are in the resource tree but not directly tracked by ArgoCD.

ronaknnathani avatar Jun 20 '24 22:06 ronaknnathani

One alternate thought ive had while trying to debug why some of our helms get stuck in this issue is we could add some sort of argocd.argoproj.io/skip-reconcile-time: '300'

in theory this would be some sort of number set on each application and if it has been under this time then simply skip.

i.e argocd.argoproj.io/skip-reconcile-time: '300' would result in only 1 refresh in 5 minutes no matter what

I suppose the only exception we may want is manual refreshes will always run.

This would bet better then simply marking an application as skip entirely as it would atleast keep some sort of status/progression while not hamstringing the application server

phyzical avatar Jul 01 '24 06:07 phyzical

@diranged this is happening to me also. I have few fields ignored but the resources still syncing.

For instance

  resource.customizations.ignoreResourceUpdates.keda.sh_ScaledObject: |
    jsonPointers:
    - /metadata/resourceVersion
    - /spec/triggers
    - /spec/cooldownPeriod
    - /spec/pollingInterval
    - /status/lastActiveTime

Not sure what to do next. I am using 2.10.3 version

santinoncs avatar Jul 31 '24 09:07 santinoncs