cloud-provider-azure icon indicating copy to clipboard operation
cloud-provider-azure copied to clipboard

Kubernetes Service with invalid settings generates an impossible to break loop trying to either create or delete the PIP in the LB

Open djsly opened this issue 1 year ago • 3 comments

What happened:

We created a svc in kubernetes with bad annotation ( the ip-prefix-id was valid but the Kubernetes cluster didn't have access to it) This generated an error that was reoccurring every 5 minutes. Leavinghte Public IP creations in a pending state

E0810 02:03:40.625296       1 azure_loadbalancer.go:1594] reconcileLoadBalancer: failed to get load balancer for service "71bb6ffc-8447-4daf-b41a-aefcae6ff5b8/emissary", error: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 403, RawError: {"error":{"code":"AuthorizationFailed","message":"The client '<redacted>' with object id '<redacted>' does not have authorization to perform action 'Microsoft.Network/publicIPAddresses/read' over scope '/subscriptions/<redacted>/resourceGroups/corp-cd1-network/providers/Microsoft.Network' or the scope is invalid. If access was recently granted, please refresh your credentials."}}

After we deleted the service in Kubernetes, we started getting this error instead.

I0810 04:54:05.172430       1 event.go:294] "Event occurred" object="71bb6ffc-8447-4daf-b41a-aefcae6ff5b8/emissary" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="DeleteLoadBalancerFailed" message="Error deleting load balancer: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 403, RawError: {\"error\":{\"code\":\"AuthorizationFailed\",\"message\":\"The client '<redacted>' with object id '<redacted>' does not have authorization to perform action 'Microsoft.Network/publicIPAddresses/read' over scope '/subscriptions/<redacted>/resourceGroups/corp-cd1-network/providers/Microsoft.Network' or the scope is invalid. If access was recently granted, please refresh your credentials.\"}}"

from this point on we cannot restore the state of the service unless we killl/restart the cloud-controller-manager

What you expected to happen:

I would expect that the cloud-controller-manager would not try to delete an IP of a service that was never provisioned, allows us to reuse the same service name when fixing the original use case.

How to reproduce it (as minimally and precisely as possible):

Set the following annotation, with a valid resourceID BUT one that the cloud controller manager doesnt have access to read

service.beta.kubernetes.io/azure-pip-prefix-id: /subscriptions/<redacted>

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3", GitCommit:"f18584a06fc476806da2e340e2eed960659871e8", GitTreeState:"clean", BuildDate:"2023-06-12T18:45:20Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: AKS
  • OS (e.g: cat /etc/os-release): N/A
  • Kernel (e.g. uname -a): N/A
  • Install tools: AKS
  • Network plugin and version (I f this is a network-related bug):
  • Others:

djsly avatar Aug 11 '23 03:08 djsly