provider-kubernetes
provider-kubernetes copied to clipboard
Fix delete object when credentials are already gone
Signed-off-by: Bob Haddleton [email protected]
Description of your changes
Handle the corner case where an Object's ProviderConfig Secret has been deleted before the object itself could be deleted.
This can happen when a Cluster, ProviderConfig and one or more Objects are created in the same Composition, and the Cluster credentials Secret gets deleted before all of the Objects can be deleted.
Fixes #38
I have:
- [X] Read and followed Crossplane's [contribution process].
- [X] Run
make reviewable testto ensure this PR is ready for review.
How has this code been tested
Tested this is a local cluster that had multiple resources in this condition, and verified that they were all cleaned up.
Thanks @bobh66 , the change generally looks good to me, could you please fix the pipeline issues? cc @turkenh in case of any comment.
Hi @morningspace - I can't reproduce either one of those errors in my environment. The lint error looks like a network problem with loading dependencies. I'm not sure that the check-diff error is but I can run check-diff locally and it passes.
Can you re-trigger the checks and see if it was a temporary failure?
Thanks
@bobh66 I just rerun the failed jobs and issue still there.
For the check-diff error, I cloned your repo and can reproduce it locally, the issue is because you did some change in below two files but was different from what was generated by make:
- apis/object/v1alpha1/zz_generated.deepcopy.go
- apis/v1alpha1/zz_generated.deepcopy.go
diff --git a/apis/object/v1alpha1/zz_generated.deepcopy.go b/apis/object/v1alpha1/zz_generated.deepcopy.go
index 37926a1..0fb6f2f 100644
--- a/apis/object/v1alpha1/zz_generated.deepcopy.go
+++ b/apis/object/v1alpha1/zz_generated.deepcopy.go
@@ -1,4 +1,3 @@
-//go:build !ignore_autogenerated
// +build !ignore_autogenerated
/*
diff --git a/apis/v1alpha1/zz_generated.deepcopy.go b/apis/v1alpha1/zz_generated.deepcopy.go
index f8bcba2..4442f14 100644
--- a/apis/v1alpha1/zz_generated.deepcopy.go
+++ b/apis/v1alpha1/zz_generated.deepcopy.go
@@ -1,4 +1,3 @@
-//go:build !ignore_autogenerated
// +build !ignore_autogenerated
/*
Can you please fix it?
For the lint issue, it seems it's blocked at prerequisites, probably need to update https://github.com/crossplane-contrib/provider-kubernetes/blob/main/.github/workflows/ci.yml.
Quick update on the lint failure, the real problem is the below error:
could not load export data: cannot import \"internal/goarch\" (unknown iexport format version 2), export data is newer version - update tool
There are similar issues reported some where else, https://github.com/golangci/golangci-lint-action/issues/434, which seems to include ways to fix it. @bobh66
Quick update on the lint failure, the real problem is the below error:
could not load export data: cannot import \"internal/goarch\" (unknown iexport format version 2), export data is newer version - update toolThere are similar issues reported some where else, golangci/golangci-lint-action#434, which seems to include ways to fix it. @bobh66
Yes, that is the problem.
Please upgrade it to v3 as here.
@bobh66 thanks for the PR.
While I can understand that this fixes the case you mentioned, I am a bit hesitant to remove the finalizer from the object since we cannot know whether it is really deleted or not. I believe the current behavior of provider-kubernetes is correct and this looks like something that needs to be solved in an upper layer.
We had similar problems with the ProviderConfigs being deleted while there were resources depending on them and this was solved by introducing ProviderConfigUsage. Given we already have usage tracked there, maybe we can just prevent the deletion of credential secrets until the ProviderConfig object is gone by adding a finalizer on the credentials secret. This would be a crossplane-runtime change though.
Hi @turkenh - I agree that a finalizer on the Secret would help but I don't think it would resolve this issue. Since we are creating the Cluster.eks in the same Composition, it is possible/likely that even if we had the Secret, the kubeconfig inside it is pointing at a Cluster that no longer exists. We will still get an error in the Connect() function when we try to create the kubeclient, or later in the Observe() function when we try to contact the Cluster. That means we would need a finalizer on the Cluster that keeps it in existence until the Secret has been deleted. Without that in place we would just have a Secret orphaned along with the Objects.
I think the providers need to handle the case where the managed resource has been deleted and there is no possible way that they can verify that the associated external resource has also been deleted. If the remote cluster is gone then the external managed resource is also gone. If the remote cluster is unreachable because the credentials have been deleted, then for all intents and purposes the remote cluster is gone. If the Object is not marked as Deleted then I agree it should be left alone for possible recovery. But if the Object is deleted, I don't see any reason to keep it when there is no way to reconcile it. If the number of orphaned Objects gets to be large it will start affecting the performance of the provider and response times will drop.
This issue seems to be closely related to https://github.com/crossplane/crossplane/issues/2439 and https://github.com/crossplane/crossplane/issues/2072 but it appears that a comprehensive solution is still TBD.
provider-helm has this same problem, so whatever solution we end up with will need to be applied there as well.
Thanks
Hi @turkenh - any thoughts on this? I'm happy to work on a more complete or higher-level solution, but I'm not sure how we would tie these dependencies back to the Cluster object? Thanks
@bobh66 is this PR still relevant or should we close it?
Closing in favor of the Usage work in https://github.com/crossplane/crossplane/pull/4444