cluster-api-provider-openstack
cluster-api-provider-openstack copied to clipboard
Cleanup loadbalancers created by ccm
/kind feature
Currently, we only delete the kubeapi loadbalancer created by ClusterAPI. Usually, there are additional loadbalancers created by the ccm. I would also like to delete those on cluster deletion. They cannot be deleted by the ccm, because the ccm is not running anymore. The cluster deletion get's blocked because network etc. cannot be deleted if there are ccm orphan lbs left.
Notes:
- we should also add a e2e test for it. I think we should implement it similarly to CAPA: https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/master/test/e2e/suites/unmanaged/unmanaged_test.go#L178-L210
- we have to ensure we're not deleting unrelated LBs in the same OpenStack project
- I'm not sure if there is a way to identify if the loadbalancers are related to the cluster we are deleting
- worst case is that we need a flag on the CAPO controller to opt-out of this behaviour
- I assume https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/839 will not significatly change the problem, so let's ignore it for now.
/assign @seanschneeweiss (we already have an internal implementation of this feature)
/cc @iamemilio I assume you can provide some input on this :)
@sbueringer: GitHub didn't allow me to assign the following users: seanschneeweiss.
Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
/assign @seanschneeweiss (we already have an internal implementation of this feature)
/cc @iamemilio I assume you can provide some input on this :)
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/assign (assigning myself in place of @seanschneeweiss as he cannot be assigned)
This is interesting feature. I want delete or not flag. Because OpenStack cloud provider user not only uses load balancer, but also cinder CSI or ingress controller etc. they(and I) may confuse only load balancers are deleted even if I know load balancer is in CCM and others are in other controllers.
if you check https://github.com/kubernetes/cloud-provider-openstack there are many resources might be related ,KMS, CSI (cinder, manila) etc I think delete something not created by CAPO is dangerous as we might be lack of info to decide whether to delete and delete what?
I think if there's a reasonable safe way to identify which load balancers have been created by the ccm of the cluster we just deleted, it's fine to delete them. I think I wouldn't touch disks etc. as there is a chance we accidentally delete valuable data this way. Load balancers can be re-created if necessary.
The AWS e2e test verifies that the lbs are deleted and the disks survive the cluster deletion.
The alternative is that you always have to delete the loadbalancers via ccm first and then trigger the cluster deletion. Or that the cluster deletion gets stuck in the middle and someone has to manually cleanup the loadbalancers in OpenStack.
If we think the regular case is that users want to manually cleanup the loadbalancers we can make this opt-in instead of opt-out.
I've opened a PR #844 with the currently used implementation. This is a draft to be extended by the e2e tests and further filters to prevent unwanted deletion of other objects. Sean Schneeweiss [email protected], Daimler TSS GmbH, legal info/Impressum
/unassign /assign @seanschneeweiss (It wasn't possible to assign Sean before because he wasn't a member of the org)
Should there be an additional selection for orphaned load balancers other than the project id? How can we identify the load balancers of the to be deleted cluster?
Does this just delete all the load balancers in the OpenStack project? What happens if there is more than one cluster in a project?
Is the issue here that the network/router that we created won't delete because there are load balancers attached to it? In that case, it makes sense to me that the condition should be:
if we are deleting the network then delete the load balancers attached to it first
That means that:
- Load balancers for other clusters with separate networks will not be deleted
- If we did not create the network, and hence are leaving it behind afterwards, we leave the load balancers behind
That seems like sensible behaviour to me?
I am still looking for how to identify all the load balancers capo deployed.
Naming convention is fmt.Sprintf("kube_service_%s_%s_%s", clusterName, service.Namespace, service.Name), but I am not sure what clusterName
is.
@jichenjc any comments?
@hidekazuna see this https://github.com/kubernetes/cloud-provider/blob/master/app/core.go#L87
the clusterName comes from ctx.ComponentConfig.KubeCloudShared.ClusterName
which I believe from k8s itself
A trick we have used in our deployments is to apply a tag to resources created by CAPO. That might simplify your problem.
apply a tag to resources created by CAPO
agree, create tag is much simple and should be considered common approach
but looks like this is more like a OCCM question this issue is talking about LB created by OCCM need to be cleaned up as well if I understand correctly
@jichenjc Thanks information.
@iamemilio @jichenjc Yes, LBs in this issue are created by CCM, we can not use tag.
@seanschneeweiss I think we can identify LBs created by CCM deployed by CAPO by it's name starts with kube_service_clusterName
.
this might come a big question, cloud provider support PV, PVC and other resource creation through cinder, manila , I am not sure whether CAPI cluster deletion action here will include those as well?
haven't try e.g whether this will cascade or prevent cluster deletion etc. so real user use CAPI with workload might have more words on this
@jichenjc
CAPI cluster deletion does not delete service load balancers (it does delete the API server load balancer) or Cinder volumes provisioned for PVCs.
One option for CAPI clusters would be to have CAPI clean up LoadBalancer services and PVCs before deleting a cluster.
CAPI cluster deletion does not delete service load balancers (it does delete the API server load balancer) or Cinder volumes provisioned for PVCs.
One option for CAPI clusters would be to have CAPI clean up LoadBalancer services and PVCs before deleting a cluster.
Yes, I understand , so as you indicated in the PR #990 maybe the reasonable way for now is to use naming conventions to remove the LBs first
and ask the CAPI community about the desire on handling the resources created outside of CAPI lifecycle.. I guess it might be the operator's job to consider ,or maybe better way is to create something really to do a quick test (e.g create CAPI cluster and create PV/PVC and LB etc on it ) then delete and see what happened
I want to deploy k8s clusters in the same network and the same project for testing purpose. I do not want to delete LBs based on the network nor project.
Deleting by name kube_service_<clustername>_<namespace>_<servicename>
sounds the most reasonable to me as it includes the cluster name and we should have no unwanted deletions of other resources.
Maybe a check could be added that
- counts the numbers of load balancers before deleting the network (when deleting the cluster),
- and returns an error or prints a log message to notify about the orphaned load balancer.
The clustername
originates from the setting cluster-name
presented to the kube-controller-manager
. xref
Default value is kubernetes
, but with CAPI/CAPO this value should be set correctly I hope.
I guess that deleting PVs is out of scope. PVs might contain important data and they should probably never get deleted automatically or as part of a garbage collection.
Load Balancers can be recreated at any time. In addition, orphaned load balancers created by the OCCM are somehow expected and they block network deletion.
So I like the suggestion to delete load balancers based on the naming convention kube_service_<clustername>_<namespace>_<servicename>
.
One option for CAPI clusters would be to have CAPI clean up LoadBalancer services and PVCs before deleting a cluster.
For the CAPO it would be good to rely on CAPI to clean the LoadBalancer services. I don't know if there is any provider where the load balancer is independent from the cluster network itself... I'll have to ask the CAPI developers.
Does this just delete all the load balancers in the OpenStack project? What happens if there is more than one cluster in a project?
Is the issue here that the network/router that we created won't delete because there are load balancers attached to it? In that case, it makes sense to me that the condition should be:
if we are deleting the network then delete the load balancers attached to it first
That means that:
- Load balancers for other clusters with separate networks will not be deleted
- If we did not create the network, and hence are leaving it behind afterwards, we leave the load balancers behind
That seems like sensible behaviour to me?
I still think that this seems like reasonable behaviour right?
If CAPO created the network for a cluster and wants to remove it when the cluster is deleted, then it should be assumed that it is OK to remove anything blocking that deletion, e.g. load balancers.
If CAPO did not create the network, then it doesn't need to delete the network as part of cluster deletion and we don't need to worry about it.
Just worried we are over-complicating things, and making ourselves brittle, by assuming naming conventions from the CCM...
If CAPO created the network for a cluster and wants to remove it when the cluster is deleted, then it should be assumed that it is OK to remove anything blocking that deletion, e.g. load balancers.
@mkjpryor Sounds reasonable, indeed. How can we identify whether CAPO created the network, any suggestion?
@seanschneeweiss
We already have a gate that decides whether we should delete the network or not. If we are deleting the network, we should delete the load balancers attached to it first.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Time is a bit rare at this moment but I'll hope to work on this very soon. If someone likes to takeover, please do not hesitate to do so.
I would like something like but Im not sure how much extra logic and code a feature would add. A different approach, that doesnt require support from CAPO, is for users to use the BeforeClusterDeleteHook
from the Lifecycle Hooks. It's still an experimental feature and only works with ClusterClass afaik, but it does allow users to inject arbitrary logic without capo needing to support anything specific.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale