cloud-provider-azure icon indicating copy to clipboard operation
cloud-provider-azure copied to clipboard

[BUG] Private Link deletion gets stuck on existing Private Endpoint connections

Open sebader opened this issue 3 years ago • 8 comments

What happened:

When using the new managed Private Link Service (https://github.com/kubernetes-sigs/cloud-provider-azure/issues/872), a cluster (or just the Private Link itself, too, I guess) cannot be deleted until all Private Endpoint Connections to that PLS are deleted beforehand. If you attemp to delete AKS, all the resources from the managed RG get deleted - expect the PLS and in the internal LB. After this the deletion process just gets stuck until the connections are manually deleted.

What you expected to happen:

AKS should automatically remove any Private Endpoint Connections when the managed Private Link Service is to be deleted. Since all other resources do get deleted, incoming requests will fail anyway. So it's not like we need to block deletion because of these connections.

How to reproduce it (as minimally and precisely as possible):

  1. Expose a k8s service with the PLS annotation
  2. create an Private Endpoint against that PLS
  3. Try to delete the AKS cluster

Environment:

  • Kubernetes version (use kubectl version): 1.23.5

@feiskyer

sebader avatar Aug 03 '22 13:08 sebader

This is actually the common behavior for most Azure services. To avoid unexpected deletion of Azure resources, the deletion would be blocked if they are still referenced by another resources. In this case, because PrivateEndpoint is managed by customer, customer should delete the PrivateEndpoint before deleting AKS cluster.

feiskyer avatar Aug 04 '22 02:08 feiskyer

hmm but since all the other resources of the cluster also get deleted, what is the point in keeping the PLS? At that point it cannot be recovered anymore

sebader avatar Aug 04 '22 08:08 sebader

hmm but since all the other resources of the cluster also get deleted, what is the point in keeping the PLS? At that point it cannot be recovered anymore

I ran into the same issue yesterday. When the PLS is created (and managed) by AKS (via AKS PLS integration) it's by default created within the AKS-owned/managed MC_ resource group next to the LB. Wouldn't it be natural/logical then to share the same lifecycle as the AKS cluster including getting deleted together with the AKS cluster?

heoelri avatar Aug 04 '22 08:08 heoelri

If the PE is created in the MC_ resource group, then they should share the same lifecycle. But if the PE is outside, then AKS would not have the permission to operate the PE.

feiskyer avatar Aug 04 '22 12:08 feiskyer

I might be wrong, but I would assume as the owner of the PLS (which AKS is), you have all the permissions to revoke/delete any Private Endpoint Connections I do not mean the actual Private Endpoint resource. That will just end up in a Disconnected state, which is expected behavior)

image

sebader avatar Aug 04 '22 12:08 sebader

any update on this @feiskyer? I still strongly believe AKS deletion should not get stuck on this

sebader avatar Sep 06 '22 12:09 sebader

No, it is actually same for other resources under node resource group. If they are referenced by other things outside of node resource group, customers need to unlink them before deleting the cluster.

feiskyer avatar Sep 07 '22 07:09 feiskyer

hm ok I see. But what is the reasoning behind this? As the cluster deletion itself is not being stopped, any other resources become stale and are not recoverable anyway.

sebader avatar Sep 07 '22 07:09 sebader

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 06 '22 08:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jan 05 '23 09:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Feb 04 '23 09:02 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 04 '23 09:02 k8s-ci-robot