argo-cd icon indicating copy to clipboard operation
argo-cd copied to clipboard

Resources with custom `OwnerReference` not deleted when removed from Helm Chart

Open AmitBenAmi opened this issue 5 years ago • 13 comments

Describe the bug

I'm using ArgoCD for deploying my applications on Kubernetes. I am using my own chart with CRDs I created: Host and Check CRDs. The chart templates both the Host and Check, in a way that a single values.yaml holds one Host resource and multiple Check resources. The logical hierarchy is that Host contains multiple Checks. In my custom controller for the CRDs, I set the OwnerReference of each Check to be its Host - after the resources were deployed to Kubernetes.

The problem is when I delete a specific Check from the Helm Chart's values.yaml, ArgoCD doesn't show that it is in a status of OutOfSync (requires pruning), but rather removes the resources status from the application. Also, the only actions available at all of this resource (which have custom OwnerReference is only Delete, but it doesn't have the Sync action (although it is from the Helm Chart).

To Reproduce

Since I'm using my own CRDs, the only way to reproduce is by using general Kubernetes resources. That can be done by creating a chart with 2 pods as resources:

  1. The first pod will be the father
  2. The second pod (the child pod) will have the father as OwnerReference

Flow:

  1. Add the father pod to the helm chart, and afterwards deploy from ArgoCD's application
  2. Add the child pod to the helm chart, and add the first pod deployed (the father) as its OwnerReference:
ownerReferences:
  - apiVersion: v1
    kind: Pod
    name: pod-father
    uid: <UID-of-father-pod>
  1. After adding the child pod to the chart, deploy it from ArgoCD's application. After deployment, the child pod should be "under" the father pod like in this picture: Screen Shot 2020-11-05 at 19 01 04

The child pod have Health and Status Screen Shot 2020-11-05 at 19 05 40

The child pod doesn't have the Sync action, but only the Delete action Screen Shot 2020-11-05 at 19 11 00

  1. Now, remove the child pod from the Chart, to see ArgoCD showing it as OutOfSync (requires pruning)
  2. After the removal, the child pod won't show the OutOfSync (requires pruning) status, but rather won't have any status at all (only Health), and the actions available is only Delete Screen Shot 2020-11-05 at 19 03 08 Screen Shot 2020-11-05 at 19 03 52 Screen Shot 2020-11-05 at 19 04 30

Expected behavior

Since the resources with the custom OwnerReference are also part of the Helm Chart, they should be handled as if they didn't have any OwnerReference, and whenever they get deleted from the Chart, they should be marked as OutOfSync (requires pruning) as well as the other resources. Furthermore, the also need to have the Sync action and not only Delete.

Version

ArgoCD version v1.7.6+b04c25e (Also can reproduce on version v1.6.1+159674e )

Kubernetes version 1.16.13

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-16T00:04:31Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.13-eks-2ba888", GitCommit:"2ba888155c7f8093a1bc06e3336333fbdb27b3da", GitTreeState:"clean", BuildDate:"2020-07-17T18:48:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

AmitBenAmi avatar Nov 05 '20 17:11 AmitBenAmi

The reason it will not be considered for pruning/OutOfSync, is because it is not a top-level resource (it has a parent). By design, we only consider TLRs for pruning and sync status.

jessesuen avatar Nov 05 '20 21:11 jessesuen

Is there a way I can set both an OwnerReference for that resource and also that it would be a TLR? I added this ability in order to show the resources better with hierarchy, but currently, as I stated, it has its drawbacks.

AmitBenAmi avatar Nov 08 '20 07:11 AmitBenAmi

@jessesuen

AmitBenAmi avatar Nov 17 '20 14:11 AmitBenAmi

@jessesuen is there something I can do to make it work or isn't it possible at all?

AmitBenAmi avatar Dec 07 '20 14:12 AmitBenAmi

I don't think having it both ways works.

OwnerReference is saying "this object created me" ... but, that's not what you're doing. If you want the Checks to belong to a Host, they should be defined in the HostSpec resource, and the Host controller should create the Check resources for you (in which case Argo would show them the way you want).

If you want Checks to be created independently (the way a service or serviceMonitor resource is?) they have to be TLRs.

pbecotte avatar Dec 17 '20 17:12 pbecotte

I am also being bitten by this.

I have an idea where we maybe can have it both ways - probably others have had the same idea and I'm missing some nuance.

Owner references have a field called controller that is supposed to be either true or false depending on whether the owner reference points to the controller that created the resource.

Rather than the current behaviour, where Argo refuses to delete any resource with an owner reference, I think that Argo should be able to delete resources as long as they don't have an owner reference where controller=true.

That would allow us to still build these hierarchies, along with the cascading deletion and blocking of owner deletion, without stopping Argo from deleting these resources. Argo would only refuse to delete a resource if it explicitly has another controller rather than just a tree relationship with another resource.

mkjpryor avatar Jan 27 '23 16:01 mkjpryor

The alternative would be to mimic Helm's behaviour for Helm releases in Argo, where Argo assumes it can do what it wants with resources that are defined in the manifest produced by Helm. That is probably how most people coming from Helm would expect it to work TBH (I did!).

mkjpryor avatar Jan 27 '23 16:01 mkjpryor

FWIW, the thing I am struggling with using Argo with is Cluster API, which I think will become an increasingly common use case.

Cluster API adds owner references to objects that are part of a cluster, but they are not controller references unless a Cluster API controller actually created the object.

IMHO this would be the proper behaviour for Argo - refuse to delete resources with a controller owner reference, but delete resources with only non-controller owner references.

mkjpryor avatar Jan 27 '23 17:01 mkjpryor

@spjmurray

Here is my potential solution to this, which I actually think would be the correct behaviour, for information.

mkjpryor avatar Jan 27 '23 17:01 mkjpryor

From what I can see, this is where we would need to change things to say "number of controller references is 0" rather than "number of references".

https://github.com/argoproj/gitops-engine/blob/1ce2acc845d230a279442eb7576203525bced9cc/pkg/cache/predicates.go#L4

mkjpryor avatar Jan 27 '23 18:01 mkjpryor

It seems to me that this exact use case is precisely why there are controller and non-controller owner references right?

In particular, the use case for blocking deletion of another object while still being managed by Argo seems compelling, e.g. in Cluster API where I want Argo to manage my MachineDeployments but I also want them to block deletion of the Cluster.

mkjpryor avatar Jan 27 '23 18:01 mkjpryor

any updates on this ?

dntosas avatar Feb 17 '24 10:02 dntosas

+1

ErezWeiss avatar May 15 '24 10:05 ErezWeiss

+1 CAPI with MachineDeployments deletion use case just bit me as well

groundnuty avatar Nov 18 '24 13:11 groundnuty