java-operator-sdk icon indicating copy to clipboard operation
java-operator-sdk copied to clipboard

Trigger primary Custom Resource delete from managed Dependent Resource

Open rguillens opened this issue 2 years ago • 6 comments

Feature request

I have built a Kubernetes Operator based in Java Operator SDK v4.3.1 that handles a primary custom resource and several managed dependent resources. One of the dependent resources is a Pod running a "critical" process. If somehow (user/application/crash) this dependent resource is deleted externally, the primary custom resource should be marked for deletion.

What did you do?

This is an implementation example to describe the scenario:

@ControllerConfiguration(
    name = "myresourcereconciler",
    dependents = {
        @Dependent(
            name = "configmapmependentresource",
            type = ConfigMapDependentResource.class,
            reconcilePrecondition = ConfigMapReconcileCondition.class
        ),
        @Dependent(
            name = "criticalpoddependentresource",
            type = CriticalPodDependentResource.class
        ),
        @Dependent(
            name = "servicedependentresource",
            type = ServiceDependentResource.class
        )
    }
)
public class MyResourceReconciler implements Reconciler<MyResource>, Cleaner<MyResource> {
    
    @Override
    public UpdateControl<MyResource> reconcile(MyResource resource, Context<MyResource> context) throws Exception {
        //Reconcile implementation
        return updatedResourceStatus != null ? UpdateControl.patchStatus(updatedResource) : UpdateControl.noUpdate();
    }
    
    @Override
    public DeleteControl cleanup(MyResource resource, Context<MyResource> context) {
        //Cleanup implementation
        return DeleteControl.defaultDelete();
    }
}

...

@KubernetesDependent(labelSelector = MyResource .LABEL_SELECTOR)
public class CriticalPodDependentResource extends CRUDKubernetesDependentResource<Pod, MyResource > {
    
    @Override
    protected Pod desired(MyResource primary, Context<MyResource> context) {
        // Desired Pod creation
        return pod;
    }
    
    @Override
    public void delete(MyResource primary, Context<MyResource> context) {
        //Expected this operation to be called on dependent resource external delete event
        context.getClient().resource(primary).delete();
    }
}

What did you expect to see?

CriticalPodDependentResource.delete() operation called on dependent resource external deletion.

What did you see instead? Under which circumstances?

After deleting the critical dependent resource externally, the primary custom resource reconcile operation is triggered, and it was kind of expected...

rguillens avatar May 10 '23 12:05 rguillens

Hi @rguillens ,

there are multiple things here:

  1. Deleted is called only if a precondition not holds on a DR or the whole Workflow is being cleaned up (thus the custom resource is being deleted). If a resource deleted by someone else the it is not called it is just reconciled and re-created.
  2. There is other aspect of this, how do you know if the resource been already created before. So for example the reconciliation starts and even it creates config map and service DRs, but suddenly the process/pod terminates there will be 2 resources, but not the pod. So as next step the operator starts how do you know, if the pod was there and deleted or just was not created? This means you need to store some state somewhere, that the pod was already created. See how state is supported in DR: https://javaoperatorsdk.io/docs/dependent-resources#external-state-tracking-dependent-resources although this is not necessarily your case.

So how I would solve this:

  • store the state after the pod is created
  • before the workflow reconciled check if the pod exists, if not but there is the state it was created before simply call delete on the primary custom resource using the client, and exit the reconciliation. Currently you can do this just by standalone workflows.

Note that some teams store the state in the status (like a flag that it was created), however this has a caveat, if it is in status it might not be present in next reconciliation (cache out of sync) is some rare cases. Therefore they also manage an in memory cache about the status that always has the latest version. Pls study how it is implemented in external state DR, you can easily manage this state correctly with a config map.

csviri avatar May 11 '23 07:05 csviri

created issue that will make allow this to cover using the managed workflows: https://github.com/java-operator-sdk/java-operator-sdk/issues/1898

csviri avatar May 11 '23 07:05 csviri

Thanks, @csviri for your recommendations

Actually, I do store some state related to some of the DRs and I'm sure when to delete the CR if something happens with a "critical" DR. This state is also reflected in the status at some point, but the state management don't rely on the CR status. Using something like a ConfigMap is way much better option to store this state in my scenario, as stated in: https://javaoperatorsdk.io/docs/dependent-resources#external-state-tracking-dependent-resources

I was also looking into the KubernetesDepentent annotation implementation and usages, I think this might be a good place to customize the DR reconcile lifecycle.

rguillens avatar May 11 '23 08:05 rguillens

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] avatar Jul 11 '23 02:07 github-actions[bot]

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] avatar Sep 10 '23 01:09 github-actions[bot]

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] avatar Nov 20 '23 01:11 github-actions[bot]

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] avatar Mar 12 '24 01:03 github-actions[bot]

As far I can see the explicit invocation will cover this. Feel free to reopen if not.

csviri avatar Mar 12 '24 12:03 csviri