cluster-api-provider-openstack icon indicating copy to clipboard operation
cluster-api-provider-openstack copied to clipboard

Cleanup loadbalancers created by ccm

Open sbueringer opened this issue 3 years ago • 29 comments

/kind feature

Currently, we only delete the kubeapi loadbalancer created by ClusterAPI. Usually, there are additional loadbalancers created by the ccm. I would also like to delete those on cluster deletion. They cannot be deleted by the ccm, because the ccm is not running anymore. The cluster deletion get's blocked because network etc. cannot be deleted if there are ccm orphan lbs left.

Notes:

  • we should also add a e2e test for it. I think we should implement it similarly to CAPA: https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/master/test/e2e/suites/unmanaged/unmanaged_test.go#L178-L210
  • we have to ensure we're not deleting unrelated LBs in the same OpenStack project
    • I'm not sure if there is a way to identify if the loadbalancers are related to the cluster we are deleting
    • worst case is that we need a flag on the CAPO controller to opt-out of this behaviour
  • I assume https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/839 will not significatly change the problem, so let's ignore it for now.

sbueringer avatar Apr 09 '21 06:04 sbueringer

/assign @seanschneeweiss (we already have an internal implementation of this feature)

/cc @iamemilio I assume you can provide some input on this :)

sbueringer avatar Apr 09 '21 06:04 sbueringer

@sbueringer: GitHub didn't allow me to assign the following users: seanschneeweiss.

Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide

In response to this:

/assign @seanschneeweiss (we already have an internal implementation of this feature)

/cc @iamemilio I assume you can provide some input on this :)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 09 '21 06:04 k8s-ci-robot

/assign (assigning myself in place of @seanschneeweiss as he cannot be assigned)

sbueringer avatar Apr 09 '21 06:04 sbueringer

This is interesting feature. I want delete or not flag. Because OpenStack cloud provider user not only uses load balancer, but also cinder CSI or ingress controller etc. they(and I) may confuse only load balancers are deleted even if I know load balancer is in CCM and others are in other controllers.

hidekazuna avatar Apr 09 '21 07:04 hidekazuna

if you check https://github.com/kubernetes/cloud-provider-openstack there are many resources might be related ,KMS, CSI (cinder, manila) etc I think delete something not created by CAPO is dangerous as we might be lack of info to decide whether to delete and delete what?

jichenjc avatar Apr 09 '21 07:04 jichenjc

I think if there's a reasonable safe way to identify which load balancers have been created by the ccm of the cluster we just deleted, it's fine to delete them. I think I wouldn't touch disks etc. as there is a chance we accidentally delete valuable data this way. Load balancers can be re-created if necessary.

The AWS e2e test verifies that the lbs are deleted and the disks survive the cluster deletion.

The alternative is that you always have to delete the loadbalancers via ccm first and then trigger the cluster deletion. Or that the cluster deletion gets stuck in the middle and someone has to manually cleanup the loadbalancers in OpenStack.

If we think the regular case is that users want to manually cleanup the loadbalancers we can make this opt-in instead of opt-out.

sbueringer avatar Apr 09 '21 08:04 sbueringer

I've opened a PR #844 with the currently used implementation. This is a draft to be extended by the e2e tests and further filters to prevent unwanted deletion of other objects. Sean Schneeweiss [email protected], Daimler TSS GmbH, legal info/Impressum

seanschneeweiss avatar Apr 09 '21 09:04 seanschneeweiss

/unassign /assign @seanschneeweiss (It wasn't possible to assign Sean before because he wasn't a member of the org)

sbueringer avatar Jun 30 '21 04:06 sbueringer

Should there be an additional selection for orphaned load balancers other than the project id? How can we identify the load balancers of the to be deleted cluster?

seanschneeweiss avatar Aug 04 '21 12:08 seanschneeweiss

Does this just delete all the load balancers in the OpenStack project? What happens if there is more than one cluster in a project?

Is the issue here that the network/router that we created won't delete because there are load balancers attached to it? In that case, it makes sense to me that the condition should be:

if we are deleting the network then delete the load balancers attached to it first

That means that:

  1. Load balancers for other clusters with separate networks will not be deleted
  2. If we did not create the network, and hence are leaving it behind afterwards, we leave the load balancers behind

That seems like sensible behaviour to me?

mkjpryor avatar Aug 18 '21 16:08 mkjpryor

I am still looking for how to identify all the load balancers capo deployed. Naming convention is fmt.Sprintf("kube_service_%s_%s_%s", clusterName, service.Namespace, service.Name), but I am not sure what clusterName is.

@jichenjc any comments?

hidekazuna avatar Sep 14 '21 00:09 hidekazuna

@hidekazuna see this https://github.com/kubernetes/cloud-provider/blob/master/app/core.go#L87

the clusterName comes from ctx.ComponentConfig.KubeCloudShared.ClusterName which I believe from k8s itself

jichenjc avatar Sep 14 '21 02:09 jichenjc

A trick we have used in our deployments is to apply a tag to resources created by CAPO. That might simplify your problem.

iamemilio avatar Sep 14 '21 17:09 iamemilio

apply a tag to resources created by CAPO

agree, create tag is much simple and should be considered common approach

but looks like this is more like a OCCM question this issue is talking about LB created by OCCM need to be cleaned up as well if I understand correctly

jichenjc avatar Sep 15 '21 01:09 jichenjc

@jichenjc Thanks information. @iamemilio @jichenjc Yes, LBs in this issue are created by CCM, we can not use tag. @seanschneeweiss I think we can identify LBs created by CCM deployed by CAPO by it's name starts with kube_service_clusterName.

hidekazuna avatar Sep 15 '21 03:09 hidekazuna

this might come a big question, cloud provider support PV, PVC and other resource creation through cinder, manila , I am not sure whether CAPI cluster deletion action here will include those as well?

haven't try e.g whether this will cascade or prevent cluster deletion etc. so real user use CAPI with workload might have more words on this

jichenjc avatar Sep 15 '21 03:09 jichenjc

@jichenjc

CAPI cluster deletion does not delete service load balancers (it does delete the API server load balancer) or Cinder volumes provisioned for PVCs.

One option for CAPI clusters would be to have CAPI clean up LoadBalancer services and PVCs before deleting a cluster.

mkjpryor avatar Sep 15 '21 05:09 mkjpryor

CAPI cluster deletion does not delete service load balancers (it does delete the API server load balancer) or Cinder volumes provisioned for PVCs.

One option for CAPI clusters would be to have CAPI clean up LoadBalancer services and PVCs before deleting a cluster.

Yes, I understand , so as you indicated in the PR #990 maybe the reasonable way for now is to use naming conventions to remove the LBs first

and ask the CAPI community about the desire on handling the resources created outside of CAPI lifecycle.. I guess it might be the operator's job to consider ,or maybe better way is to create something really to do a quick test (e.g create CAPI cluster and create PV/PVC and LB etc on it ) then delete and see what happened

jichenjc avatar Sep 16 '21 00:09 jichenjc

I want to deploy k8s clusters in the same network and the same project for testing purpose. I do not want to delete LBs based on the network nor project.

hidekazuna avatar Sep 16 '21 02:09 hidekazuna

Deleting by name kube_service_<clustername>_<namespace>_<servicename> sounds the most reasonable to me as it includes the cluster name and we should have no unwanted deletions of other resources. Maybe a check could be added that

  • counts the numbers of load balancers before deleting the network (when deleting the cluster),
  • and returns an error or prints a log message to notify about the orphaned load balancer.

seanschneeweiss avatar Sep 17 '21 16:09 seanschneeweiss

The clustername originates from the setting cluster-name presented to the kube-controller-manager. xref Default value is kubernetes, but with CAPI/CAPO this value should be set correctly I hope.

seanschneeweiss avatar Oct 22 '21 11:10 seanschneeweiss

I guess that deleting PVs is out of scope. PVs might contain important data and they should probably never get deleted automatically or as part of a garbage collection.

Load Balancers can be recreated at any time. In addition, orphaned load balancers created by the OCCM are somehow expected and they block network deletion. So I like the suggestion to delete load balancers based on the naming convention kube_service_<clustername>_<namespace>_<servicename>.

One option for CAPI clusters would be to have CAPI clean up LoadBalancer services and PVCs before deleting a cluster.

For the CAPO it would be good to rely on CAPI to clean the LoadBalancer services. I don't know if there is any provider where the load balancer is independent from the cluster network itself... I'll have to ask the CAPI developers.

seanschneeweiss avatar Jan 18 '22 13:01 seanschneeweiss

Does this just delete all the load balancers in the OpenStack project? What happens if there is more than one cluster in a project?

Is the issue here that the network/router that we created won't delete because there are load balancers attached to it? In that case, it makes sense to me that the condition should be:

if we are deleting the network then delete the load balancers attached to it first

That means that:

  1. Load balancers for other clusters with separate networks will not be deleted
  2. If we did not create the network, and hence are leaving it behind afterwards, we leave the load balancers behind

That seems like sensible behaviour to me?

I still think that this seems like reasonable behaviour right?

If CAPO created the network for a cluster and wants to remove it when the cluster is deleted, then it should be assumed that it is OK to remove anything blocking that deletion, e.g. load balancers.

If CAPO did not create the network, then it doesn't need to delete the network as part of cluster deletion and we don't need to worry about it.

mkjpryor avatar Feb 02 '22 11:02 mkjpryor

Just worried we are over-complicating things, and making ourselves brittle, by assuming naming conventions from the CCM...

mkjpryor avatar Feb 02 '22 11:02 mkjpryor

If CAPO created the network for a cluster and wants to remove it when the cluster is deleted, then it should be assumed that it is OK to remove anything blocking that deletion, e.g. load balancers.

@mkjpryor Sounds reasonable, indeed. How can we identify whether CAPO created the network, any suggestion?

seanschneeweiss avatar Apr 12 '22 15:04 seanschneeweiss

@seanschneeweiss

We already have a gate that decides whether we should delete the network or not. If we are deleting the network, we should delete the load balancers attached to it first.

mkjpryor avatar Apr 12 '22 15:04 mkjpryor

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 11 '22 16:07 k8s-triage-robot

/remove-lifecycle stale

Time is a bit rare at this moment but I'll hope to work on this very soon. If someone likes to takeover, please do not hesitate to do so.

seanschneeweiss avatar Jul 18 '22 21:07 seanschneeweiss

I would like something like but Im not sure how much extra logic and code a feature would add. A different approach, that doesnt require support from CAPO, is for users to use the BeforeClusterDeleteHook from the Lifecycle Hooks. It's still an experimental feature and only works with ClusterClass afaik, but it does allow users to inject arbitrary logic without capo needing to support anything specific.

nikParasyr avatar Aug 06 '22 12:08 nikParasyr

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 04 '22 13:11 k8s-triage-robot