Resources with custom `OwnerReference` not deleted when removed from Helm Chart
Describe the bug
I'm using ArgoCD for deploying my applications on Kubernetes.
I am using my own chart with CRDs I created: Host and Check CRDs.
The chart templates both the Host and Check, in a way that a single values.yaml holds one Host resource and multiple Check resources.
The logical hierarchy is that Host contains multiple Checks. In my custom controller for the CRDs, I set the OwnerReference of each Check to be its Host - after the resources were deployed to Kubernetes.
The problem is when I delete a specific Check from the Helm Chart's values.yaml, ArgoCD doesn't show that it is in a status of OutOfSync (requires pruning), but rather removes the resources status from the application. Also, the only actions available at all of this resource (which have custom OwnerReference is only Delete, but it doesn't have the Sync action (although it is from the Helm Chart).
To Reproduce
Since I'm using my own CRDs, the only way to reproduce is by using general Kubernetes resources. That can be done by creating a chart with 2 pods as resources:
- The first pod will be the father
- The second pod (the child pod) will have the father as
OwnerReference
Flow:
- Add the father pod to the helm chart, and afterwards deploy from ArgoCD's application
- Add the child pod to the helm chart, and add the first pod deployed (the father) as its
OwnerReference:
ownerReferences:
- apiVersion: v1
kind: Pod
name: pod-father
uid: <UID-of-father-pod>
- After adding the child pod to the chart, deploy it from ArgoCD's application. After deployment, the child pod should be "under" the father pod like in this picture:

The child pod have Health and Status

The child pod doesn't have the Sync action, but only the Delete action

- Now, remove the child pod from the Chart, to see ArgoCD showing it as
OutOfSync (requires pruning) - After the removal, the child pod won't show the
OutOfSync (requires pruning)status, but rather won't have any status at all (only Health), and the actions available is onlyDelete

Expected behavior
Since the resources with the custom OwnerReference are also part of the Helm Chart, they should be handled as if they didn't have any OwnerReference, and whenever they get deleted from the Chart, they should be marked as OutOfSync (requires pruning) as well as the other resources.
Furthermore, the also need to have the Sync action and not only Delete.
Version
ArgoCD version v1.7.6+b04c25e (Also can reproduce on version v1.6.1+159674e )
Kubernetes version 1.16.13
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-16T00:04:31Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.13-eks-2ba888", GitCommit:"2ba888155c7f8093a1bc06e3336333fbdb27b3da", GitTreeState:"clean", BuildDate:"2020-07-17T18:48:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
The reason it will not be considered for pruning/OutOfSync, is because it is not a top-level resource (it has a parent). By design, we only consider TLRs for pruning and sync status.
Is there a way I can set both an OwnerReference for that resource and also that it would be a TLR?
I added this ability in order to show the resources better with hierarchy, but currently, as I stated, it has its drawbacks.
@jessesuen
@jessesuen is there something I can do to make it work or isn't it possible at all?
I don't think having it both ways works.
OwnerReference is saying "this object created me" ... but, that's not what you're doing. If you want the Checks to belong to a Host, they should be defined in the HostSpec resource, and the Host controller should create the Check resources for you (in which case Argo would show them the way you want).
If you want Checks to be created independently (the way a service or serviceMonitor resource is?) they have to be TLRs.
I am also being bitten by this.
I have an idea where we maybe can have it both ways - probably others have had the same idea and I'm missing some nuance.
Owner references have a field called controller that is supposed to be either true or false depending on whether the owner reference points to the controller that created the resource.
Rather than the current behaviour, where Argo refuses to delete any resource with an owner reference, I think that Argo should be able to delete resources as long as they don't have an owner reference where controller=true.
That would allow us to still build these hierarchies, along with the cascading deletion and blocking of owner deletion, without stopping Argo from deleting these resources. Argo would only refuse to delete a resource if it explicitly has another controller rather than just a tree relationship with another resource.
The alternative would be to mimic Helm's behaviour for Helm releases in Argo, where Argo assumes it can do what it wants with resources that are defined in the manifest produced by Helm. That is probably how most people coming from Helm would expect it to work TBH (I did!).
FWIW, the thing I am struggling with using Argo with is Cluster API, which I think will become an increasingly common use case.
Cluster API adds owner references to objects that are part of a cluster, but they are not controller references unless a Cluster API controller actually created the object.
IMHO this would be the proper behaviour for Argo - refuse to delete resources with a controller owner reference, but delete resources with only non-controller owner references.
@spjmurray
Here is my potential solution to this, which I actually think would be the correct behaviour, for information.
From what I can see, this is where we would need to change things to say "number of controller references is 0" rather than "number of references".
https://github.com/argoproj/gitops-engine/blob/1ce2acc845d230a279442eb7576203525bced9cc/pkg/cache/predicates.go#L4
It seems to me that this exact use case is precisely why there are controller and non-controller owner references right?
In particular, the use case for blocking deletion of another object while still being managed by Argo seems compelling, e.g. in Cluster API where I want Argo to manage my MachineDeployments but I also want them to block deletion of the Cluster.
any updates on this ?
+1
+1 CAPI with MachineDeployments deletion use case just bit me as well