crossplane icon indicating copy to clipboard operation
crossplane copied to clipboard

XRC -> XR Label Propagation Confuses ArgoCD

Open idallaserra opened this issue 4 years ago • 15 comments

Edit from @negz:

In the case of an XR created by an XRC, ArgoCD sees a XR appear with the application ownership label (currently automatically propagated by Crossplane), but without an owner reference to any known resource

I believe, per @mcavoyk's comment above, that this is the key issue we're seeing here. ArgoCD thinks it (not Crossplane) created the XR, because when Crossplane creates the XR it propagates a label ArgoCD uses to track which resource it owns. This results in the XR either being shown as "out-of-sync" or being deleted depending on whether auto-prune is enabled.

Original issue follows.


Dear all, in the scenario we are testing we deploy on a k8s cluster crossplane and all the resources taked from platform-ref-gcp except the final resource of the example folder. These last resources for example triggering the Cluster creation was done using ArgoCD that formally connect to the k8s cluster and execute the yaml. Flagging the auto-pruning feature of ArgoCD leads to a sort of resource contentions between Crossplane and Argo:

Normal ConfigureCompositeResource 19s (x3 over 20s) offered/compositeresourcedefinition.apiextensions.crossplane.io Successfully applied composite resource Warning ConfigureCompositeResource 19s (x2 over 20s) offered/compositeresourcedefinition.apiextensions.crossplane.io cannot patch object: Operation cannot be fulfilled on compositenetworks.gcp.platformref.crossplane.io "test-network-d6xjh": the object has been modified; please apply your changes to the latest version and try again Normal BindCompositeResource 18s (x3 over 20s) offered/compositeresourcedefinition.apiextensions.crossplane.io Composite resource is not yet ready Warning BindCompositeResource 18s offered/compositeresourcedefinition.apiextensions.crossplane.io compositenetworks.gcp.platformref.crossplane.io "test-network-d6xjh" not found

and the Cluster was not created. Disabling auto-pruning indeed lead to the resource creation, but Argo remain out-of-sync. In my opinion the correct ways is to having all the crd and definiton of gcp-reference done by Argo, not only the last one. So I am not sure if is actually a bug or not, or simple an usage misunderstanding. Thanks, Ivan

idallaserra avatar Feb 05 '21 17:02 idallaserra

Thanks for raising this issue @idallaserra! I'm not personally very familiar with ArgoCD. Can you help me understand exactly what the auto-pruning feature is trying to do? Is it trying to undo changes that Crossplane controllers are making to the kind: Cluster (for example is it trying to remove the spec.compositeRef.name and spec.resourceRef.name fields that Crossplane sets automatically?

negz avatar Feb 09 '21 01:02 negz

From what I can tell, ArgoCD deletes the composed resources that are created by Crossplane in order to keep git repo fully authoritative and in-sync all the time. Though I wonder how it'd handle Pods created by Deployments, that's essentially the same mechanism with Crossplane creating managed resources when it sees the composite resource.

muvaf avatar Feb 09 '21 13:02 muvaf

This is very similar to what I've been seeing with Argo - if you have prune enabled on an application, it will remove any resources it doesn't manage directly itself.

It can be avoided by applying argocd.argoproj.io/sync-options: Prune=false as an annotation to each resource that ArgoCD should not manage, but right now it isn't possible to apply that annotation to the Composite resource created from a Claim against an XRD (outlined in #2000). Argo will delete the Composite resource created from a Claim, causing Crossplane to part-create and then delete the underlying resources over and over.

Right now you have to disable prune on the Argo application itself, but with #2000 implemented you could set the annotation on the XRD, Composite and underlying resources so these are all managed by Crossplane, leaving the Claim resource managed by Argo.

benagricola avatar Feb 09 '21 14:02 benagricola

If you keep Prune=false then ArgoCD will be always in "out of sync" mode, that is not something that you would like to see in a gitops approach

braghettos avatar Feb 09 '21 17:02 braghettos

Thanks folks. It sounds like we should address this by finding a solution to https://github.com/crossplane/crossplane/issues/2000, which should let Argo know that it should manage the claim but not the resulting XR, and thus leave the XR alone. To @muvaf's point I think it's unrealistic to expect ArgoCD to directly manage every resource in the API server. Having ArgoCD model and manage a claim's XR and composed resources would be like having argo model and manage a Deployment's ReplicaSets and Pods.

negz avatar Feb 09 '21 19:02 negz

Though I wonder how it'd handle Pods created by Deployments, that's essentially the same mechanism with Crossplane creating managed resources when it sees the composite resource.

My understanding from using ArgoCD and a quick scan through some of logic is that generated resources are not pruned if they have a known parent resource, e.g. the Pod is not pruned because in its ownership hierarchy is a Deployment that is part of the git repo. There are some exceptions made for Endpoints and for PersistentVolumeClaim's created by StatefulSets.

Creating a XR with ArgoCD doesn't experience this pruning issue, as all the generated resources may end up with the ArgoCD application ownership label (if metadata.labels is patched on the Compositions), but the generated provider resources all have ownership references set to the XR.

In the case of an XR created by an XRC, ArgoCD sees a XR appear with the application ownership label (currently automatically propagated by Crossplane), but without an owner reference to any known resource. Attempting to disable ArgoCD prune via their available annotation is not really effective in this case, as the primary means to apply it to the XR is through adding to the XRC, which means the XRC is unable to be deleted by ArgoCD.

mcavoyk avatar Feb 16 '21 05:02 mcavoyk

@mcavoyk thank you for detailed explanation!

There are some exceptions made for Endpoints and for PersistentVolumeClaim's created by StatefulSets.

We want to come up with a good solution for this labeling stuff but I see that some exceptions are added for coreos operator group in ArgoCD. Do you think we can add an exception to cover claim <> composite relationship? The reason composite doesn't have an ownerRef is that claim is namespaced and composite is cluster-scoped and apiserver doesn't allow a namespaced resource to own a cluster-scoped one. The exception would look at spec.claimRef to figure out whether it's created by a known claim.

muvaf avatar Feb 17 '21 08:02 muvaf

Hi @negz @muvaf ,

Recently i also encounter similar error message mentioned above in my composite resource (pardon me about the unconventional format of the k8s event logs as it is generated by the Alert tool):

Exactly 2 Warning Events generated in the namespace where the Claim Resource residing:

Description: Error Occurred in MyBucket: mynamespace/test-s3 in dev cluster
cannot update composite resource: Operation cannot be fulfilled on abc.com "test-s3-2j97g": the object has been modified; please apply your changes to the latest version and try again

abc.com/v1alpha1/mybuckets error
Description: Error Occurred in MyBucket: mynamespace/test-s3 in dev cluster
cannot patch object: Operation cannot be fulfilled on abc.com "test-s3-2j97g": the object has been modified; please apply your changes to the latest version and try again

I am currently using ArgoCD for Claim Resource deployment and also to overcome the auto-pruning issue, I have also included ArgoCD settings for resource.exclusions [Ref] https://argoproj.github.io/argo-cd/operator-manual/declarative-setup/#resource-exclusioninclusion] and applied it to the Composite and Managed Resource Kind. With this setup in place, ArgoCD wont do the auto-syncing (nor it will delete the object that is not managed by them). This method might help solving the ArgoCD behavior for @benagricola @idallaserra

My suspicion is if there are other Reconciler process that has updated the Composite Resource which causing its resourceversion to be different and became fail while the Claim Reconciler try to update it. Also fyi, I don't recall having this error message before i upgraded to crossplane v1.0.0 and updated the xrd apiVersion to apiextensions.crossplane.io/v1.

kelvinwijaya avatar Mar 02 '21 17:03 kelvinwijaya

@kelvinwijaya these errors should be safe to skip as long as they don't happen continuously since the claim controller will make the same update in the next reconcile with the updated composite object.

muvaf avatar Mar 03 '21 08:03 muvaf

Hi @muvaf

Just to understand a bit on the problem, does it mean there are 2 controllers (composite & claim) that is trying to update the composite object as the same time which caused race condition?
Also are you trying to explain that eventually the claim controller will make sure the composite object to be updated during the binding process?

Thanks

kelvinwijaya avatar Mar 04 '21 04:03 kelvinwijaya

does it mean there are 2 controllers (composite & claim) that is trying to update the composite object as the same time which caused race condition?

Yes, this is possible and intended. This error is thrown when the resource that controller A will update changes between the start of its reconciliation and the end. So, all setups where two controllers make any operation on the same resource is subject to this. Though, reconciliation makes sure the resource is consistent eventually.

muvaf avatar Mar 04 '21 19:03 muvaf

The problem is actually:

  1. Argocd adds an annotation to every resource it creates directly.
  2. If a resource isn't in the manifest source that argo is syncing from, but does have an annotation, argo assumes that it was installed by argo, but it no longer is desired, so it wants to prune it.
  3. Crossplane is propogating annotations to resources it creates, including argocd annotations.
  4. Argo wants to prune everything that crossplane is generating.

empath-nirvana avatar Jan 21 '22 20:01 empath-nirvana

I've compiled my findings into #2928

I believe it covers the reason for every issue described here.

natereid72 avatar Feb 25 '22 22:02 natereid72

Adding the annotation argocd.argoproj.io/compare-options: IgnoreExtraneous to the XRC works. The XR and MRs will show up in the UI, but won't be considered when evaluating the sync status

James-Quigley avatar Mar 02 '22 19:03 James-Quigley

@James-Quigley I'm not sure how that would work. That tells ArgoCD to ignore the resource. So it will ignore the XRC too if I read docs correctly. When I set it, I see this. I think after re-reading the dpcs this is a setting that tells ArgoCD from the repo resource level, not to consider resources linked via owner ref chain in sync status.

image

After some number of syncs, it seems to be ok with the XRC, but the XR and beyond show out of sync and desired manifest blank.

If I delete the resource from ArgoCD, and then sync again, it doesn't come into UI. So seems similar to setting ignore at ArgoC project level, inly a lot of additional churn to get there.

Can you add some detail on how you've configured? I still see changing the way Crossplane is propagating labels as a better long-term solution as it doesn't require any additional configuration.

natereid72 avatar Mar 02 '22 20:03 natereid72

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 13 '22 00:08 stale[bot]

Revive

natereid72 avatar Aug 13 '22 22:08 natereid72

As of ArgoCD 2.4.8, the new annotation based tracking works with xr and xrc with auto pruning enabled. Application shows as synced and health is reflected correctly. XR does not get auto-pruned.

One caveat is that XRs will show as root resources instead of them being "children" of claims.

image

nabuskey avatar Aug 18 '22 23:08 nabuskey

@negz I propose closing this issue based on @nabuskey comment.

natereid72 avatar Aug 22 '22 21:08 natereid72

Hi, @nabuskey did you use any additional configuration/tuning, I have a recent version of ArgoCD 2.4.11 and still struggling with the pruning of XRs.

thradec avatar Aug 25 '22 10:08 thradec

Sorry, I missed the application.resourceTrackingMethod: annotation, with it, it's working great, thx.

thradec avatar Aug 25 '22 12:08 thradec

@negz I propose closing this issue based on @nabuskey comment.

I think we could close this when #3041 is merged - that provides guidance on how to configure Argo, application.resourceTrackingMethod: annotation, etc. to work well with Crossplane 🤓

jbw976 avatar Aug 25 '22 15:08 jbw976

I had to enable clusterResourceWhitelist on the specific group I'm using on my AppProjet so that I can see the Composite Resource in ArgoCD. But if I understand correctly, this would also allow my users to explicitly declare Composite Resources or even Managed Resources without using Claims. Is it true? Is there any workaround?

mcanevet avatar Jan 03 '23 10:01 mcanevet

This doesn't seem like a reasonable fix, shouldn't RD Claims that generate XRD's propagate ownership references, along with claim references. Wouldn't this correct the hierarchy issue in Argo and bring XRD's into scope of the project? At the moment, it seems that an misconfiguration in Argo could wipe out infrastructure using crossplane.

FunctionRevsions for example correctly specify the owner reference and link to the Function

image

apiVersion: pkg.crossplane.io/v1beta1
kind: FunctionRevision
metadata:
  creationTimestamp: '2024-01-12T10:50:59Z'
  finalizers:
    - revision.pkg.crossplane.io
  generation: 1
  labels:
    pkg.crossplane.io/package: terraform-example
  name: terraform-example-714659823149
  ownerReferences:
    - apiVersion: pkg.crossplane.io/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Function
      name: terraform-example
      uid: c50856f8-9a89-454a-9cfc-103057d92005

lukegAtPaxos avatar Jan 12 '24 11:01 lukegAtPaxos