provider-kubernetes icon indicating copy to clipboard operation
provider-kubernetes copied to clipboard

Fix delete object when credentials are already gone

Open bobh66 opened this issue 3 years ago • 9 comments

Signed-off-by: Bob Haddleton [email protected]

Description of your changes

Handle the corner case where an Object's ProviderConfig Secret has been deleted before the object itself could be deleted.

This can happen when a Cluster, ProviderConfig and one or more Objects are created in the same Composition, and the Cluster credentials Secret gets deleted before all of the Objects can be deleted.

Fixes #38

I have:

  • [X] Read and followed Crossplane's [contribution process].
  • [X] Run make reviewable test to ensure this PR is ready for review.

How has this code been tested

Tested this is a local cluster that had multiple resources in this condition, and verified that they were all cleaned up.

bobh66 avatar May 06 '22 01:05 bobh66

Thanks @bobh66 , the change generally looks good to me, could you please fix the pipeline issues? cc @turkenh in case of any comment.

morningspace avatar May 09 '22 13:05 morningspace

Hi @morningspace - I can't reproduce either one of those errors in my environment. The lint error looks like a network problem with loading dependencies. I'm not sure that the check-diff error is but I can run check-diff locally and it passes.

Can you re-trigger the checks and see if it was a temporary failure?

Thanks

bobh66 avatar May 09 '22 15:05 bobh66

@bobh66 I just rerun the failed jobs and issue still there.

For the check-diff error, I cloned your repo and can reproduce it locally, the issue is because you did some change in below two files but was different from what was generated by make:

  • apis/object/v1alpha1/zz_generated.deepcopy.go
  • apis/v1alpha1/zz_generated.deepcopy.go
diff --git a/apis/object/v1alpha1/zz_generated.deepcopy.go b/apis/object/v1alpha1/zz_generated.deepcopy.go
index 37926a1..0fb6f2f 100644
--- a/apis/object/v1alpha1/zz_generated.deepcopy.go
+++ b/apis/object/v1alpha1/zz_generated.deepcopy.go
@@ -1,4 +1,3 @@
-//go:build !ignore_autogenerated
 // +build !ignore_autogenerated

 /*
diff --git a/apis/v1alpha1/zz_generated.deepcopy.go b/apis/v1alpha1/zz_generated.deepcopy.go
index f8bcba2..4442f14 100644
--- a/apis/v1alpha1/zz_generated.deepcopy.go
+++ b/apis/v1alpha1/zz_generated.deepcopy.go
@@ -1,4 +1,3 @@
-//go:build !ignore_autogenerated
 // +build !ignore_autogenerated

 /*

Can you please fix it?

morningspace avatar May 10 '22 01:05 morningspace

For the lint issue, it seems it's blocked at prerequisites, probably need to update https://github.com/crossplane-contrib/provider-kubernetes/blob/main/.github/workflows/ci.yml.

morningspace avatar May 10 '22 01:05 morningspace

Quick update on the lint failure, the real problem is the below error:

could not load export data: cannot import \"internal/goarch\" (unknown iexport format version 2), export data is newer version - update tool

There are similar issues reported some where else, https://github.com/golangci/golangci-lint-action/issues/434, which seems to include ways to fix it. @bobh66

morningspace avatar May 10 '22 05:05 morningspace

Quick update on the lint failure, the real problem is the below error:

could not load export data: cannot import \"internal/goarch\" (unknown iexport format version 2), export data is newer version - update tool

There are similar issues reported some where else, golangci/golangci-lint-action#434, which seems to include ways to fix it. @bobh66

Yes, that is the problem. Please upgrade it to v3 as here.

turkenh avatar May 10 '22 08:05 turkenh

@bobh66 thanks for the PR.

While I can understand that this fixes the case you mentioned, I am a bit hesitant to remove the finalizer from the object since we cannot know whether it is really deleted or not. I believe the current behavior of provider-kubernetes is correct and this looks like something that needs to be solved in an upper layer.

We had similar problems with the ProviderConfigs being deleted while there were resources depending on them and this was solved by introducing ProviderConfigUsage. Given we already have usage tracked there, maybe we can just prevent the deletion of credential secrets until the ProviderConfig object is gone by adding a finalizer on the credentials secret. This would be a crossplane-runtime change though.

turkenh avatar May 10 '22 08:05 turkenh

Hi @turkenh - I agree that a finalizer on the Secret would help but I don't think it would resolve this issue. Since we are creating the Cluster.eks in the same Composition, it is possible/likely that even if we had the Secret, the kubeconfig inside it is pointing at a Cluster that no longer exists. We will still get an error in the Connect() function when we try to create the kubeclient, or later in the Observe() function when we try to contact the Cluster. That means we would need a finalizer on the Cluster that keeps it in existence until the Secret has been deleted. Without that in place we would just have a Secret orphaned along with the Objects.

I think the providers need to handle the case where the managed resource has been deleted and there is no possible way that they can verify that the associated external resource has also been deleted. If the remote cluster is gone then the external managed resource is also gone. If the remote cluster is unreachable because the credentials have been deleted, then for all intents and purposes the remote cluster is gone. If the Object is not marked as Deleted then I agree it should be left alone for possible recovery. But if the Object is deleted, I don't see any reason to keep it when there is no way to reconcile it. If the number of orphaned Objects gets to be large it will start affecting the performance of the provider and response times will drop.

This issue seems to be closely related to https://github.com/crossplane/crossplane/issues/2439 and https://github.com/crossplane/crossplane/issues/2072 but it appears that a comprehensive solution is still TBD.

provider-helm has this same problem, so whatever solution we end up with will need to be applied there as well.

Thanks

bobh66 avatar May 10 '22 12:05 bobh66

Hi @turkenh - any thoughts on this? I'm happy to work on a more complete or higher-level solution, but I'm not sure how we would tie these dependencies back to the Cluster object? Thanks

bobh66 avatar May 15 '22 21:05 bobh66

@bobh66 is this PR still relevant or should we close it?

turkenh avatar Aug 18 '23 07:08 turkenh

Closing in favor of the Usage work in https://github.com/crossplane/crossplane/pull/4444

bobh66 avatar Aug 19 '23 19:08 bobh66