aws-load-balancer-controller icon indicating copy to clipboard operation
aws-load-balancer-controller copied to clipboard

unable to resolve at least one subnet

Open pranavnateri opened this issue 2 years ago • 17 comments

Describe the bug i have installed aws load balancer controller using helm chart. However, i am getting the below error. {"level":"error","ts":"2023-05-26T08:35:07Z","msg":"Reconciler error","controller":"service","object":{"name":"grpc-ingressgateway","namespace":"istio-system"},"namespace":"istio-system","name":"grpc-ingressgateway","reconcileID":"3e7af4a6-6605-4596-84dc-8c0ce70032c2","error":"unable to resolve at least one subnet"}

Please note that i have the tags in all of my subnets(both private and public) image

Also, ALB controller points to the correct cluster name and VPC.

containers:
      - args:
        - --cluster-name=intr-dev-eks-eu-west-1
        - --ingress-class=alb
        - --aws-region=eu-west-1
        - --aws-vpc-id=vpc-0b3f9a142f4370b89
        image: public.ecr.aws/eks/aws-load-balancer-controller:v2.5.2

But still its not discovering the subnets. Not sure what is the issue. Please help

Steps to reproduce Install AWS LOAD BALANCER CONTROLLER

Expected outcome A concise description of what you expected to happen.

Environment

  • AWS Load Balancer controller version v2.5.2
  • Kubernetes version 1.26
  • Using EKS (yes/no), if so version? yes, 1.26

Additional Context:

pranavnateri avatar May 26 '23 08:05 pranavnateri

@pranavnateri can you please double check if these subnets are under the vpc vpc-0b3f9a142f4370b89 as specified in your args. Also the available IP address should be more than 8 in your subnets. Ref: https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/release-2.5/pkg/networking/subnet_resolver.go#LL442C3-L442C3

oliviassss avatar May 26 '23 23:05 oliviassss

I met the same problem. Does any one have some idea?

KaimingWan avatar May 29 '23 09:05 KaimingWan

Same issue.

We are using the NLB mode.

{"level":"error","ts":"2023-06-02T15:15:17Z","msg":"Reconciler error","controller":"service","object":{"name":"ingress-nginx-c8d74760-controller","namespace":"ingress-nginx"},"namespace":"ingress-nginx","name":"ingress-nginx-c8d74760-controller","reconcileID":"1fe38c11-f1bc-429b-8746-0e1dadcb5570","error":"unable to resolve at least one subnet"}

For reference, it was working as expected with Chart 1.4.8 and app 2.4.7

I updated to Chart 1.5.3 and app 2.5.2 and also:

  1. deleted nginx ingress service (it referenced to loadbalancer aws-nlb)
  2. recreate new nginx ingress service with one additional annotation.
service.beta.kubernetes.io/aws-load-balancer-eip-allocations=eipalloc-066f2201dcfbd6c5d,eipalloc-0dc581f069762b3c3

Update 1

  • Rollback to app version 2.4.7
  • Still the same issue.
  • fyi. Each Subnet has more than 1000 free IPs

Update 2

  • Removed the annotation service.beta.kubernetes.io/aws-load-balancer-eip-allocations,
  • recreated service
  • Everything working again.

fyi: The NLB is across 2 zones (we used 2 EIPs), and I would like to use my EIPs eipalloc-066f2201dcfbd6c5d, eipalloc-0dc581f069762b3c3

Update 3

  • Specifying the annotation (Note the space after coma)
  • Recreate service
  • Fail
service.beta.kubernetes.io/aws-load-balancer-eip-allocations=eipalloc-066f2201dcfbd6c5d, eipalloc-0dc581f069762b3c3

Yields the following error message:

Error syncing load balancer: failed to ensure load balancer: error creating load balancer: "AllocationIdNotFound: 
The allocation ID ' eipalloc-0dc581f069762b3c3' does not exist (Service: AmazonEC2; 
Status Code: 400; Error Code: InvalidAllocationID.NotFound; Request ID: 9e4a1461-3f1d-4038-a5a0-9b788d1c7d7a; 
Proxy: null)\n\tstatus code: 400, request id: cbac36a3-addd-4cd0-a085-7c5b71459c5c"

this seems to be related to https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2412#issuecomment-1006348720

Update 4

  • Deleted the ingress service that pointed to the NLB,
  • installed alb 2.5.3 from chart 1.5.3
  • deployed nginx ingress again with Annotation service.beta.kubernetes.io/aws-load-balancer-eip-allocations=eipalloc-066f2201dcfbd6c5d,eipalloc-0dc581f069762b3c3
  • Works

I have no explanation why it just works, it was the same process as initially. However, now my EIPs are used as expected. eipalloc-066f2201dcfbd6c5d,eipalloc-0dc581f069762b3c3

Vad1mo avatar Jun 02 '23 15:06 Vad1mo

@Vad1mo Thanks for the detailed info. The error log indicates the controller failed to resolve subnets, so I think it might be some transition issue when you deleted and recreated the NLB as initially. The best practice, as suggested in our live doc, would be assigning the same number of subnets via annotation service.beta.kubernetes.io/aws-load-balancer-subnets, as the number of EIPs assigned. For more reference you can check: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.5/guide/service/annotations/#eip-allocations

oliviassss avatar Jun 05 '23 18:06 oliviassss

I had the same problem. Brand new EKS cluster with chart 1.5.3 and getting "unable to resolve at least one subnet".

I did not have the required tags for auto discovery as documented here https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.5/deploy/subnet_discovery/

After setting the required tags I deleted the pods to force a reload and the NLBs were created without issue.

taylorshaulis avatar Jun 06 '23 16:06 taylorshaulis

@oliviassss I am wondering why autodiscovery isn't working in that case, we have 2 public and 2 private subnets. All nodes are in the private subnets. IMHO it should work, why the need to add the subnets as well?

Vad1mo avatar Jun 06 '23 16:06 Vad1mo

Here is my conclusion, after trying out all the different constellations

  1. The functionality is broken or
  2. The documentation is missing a critical part.

In my previous comment, https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/3212#issuecomment-1573920208 iterated a few options. It is all very flaky and unreliable.

Constellation #1

As advised, I added to the ingress service these annotations:

kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-eip-allocations: eipalloc-0d8f2c0f17aeb24da,eipalloc-086f78e84e2130bde
    service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-003e7a14308b14be5,subnet-035046f571c2cf29e
    service.beta.kubernetes.io/aws-load-balancer-type: nlb

Error Message:

EIP allocations can only be set for internet facing load balancers

Both subnets subnet-003e7a14308b14be5,subnet-035046f571c2cf29e are public.

image

No idea why AWS is trying to create an intranet NLB, given public EIPs and public Subnets.

Constellation #2

kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-eip-allocations: eipalloc-0d8f2c0f17aeb24da,eipalloc-086f78e84e2130bde
    # service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-003e7a14308b14be5,subnet-035046f571c2cf29e
    service.beta.kubernetes.io/aws-load-balancer-type: nlb

Error Message:

unable to resolve at least one subnet

Constellation #3

Added tag to the public subnets as suggested here and in subnet discovery

kubernetes.io/role/elb


image
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-eip-allocations: eipalloc-0d8f2c0f17aeb24da,eipalloc-086f78e84e2130bde
    # service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-003e7a14308b14be5,subnet-035046f571c2cf29e
    service.beta.kubernetes.io/aws-load-balancer-type: nlb

Error Message:

unable to resolve at least one subnet

Vad1mo avatar Jun 08 '23 10:06 Vad1mo

We found out what caused the issue: It was a combination configuration change with the breaking change in the application v. 2.5

It seems now to be mandatory to set: service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

Here is the final configuration that works reliably

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-eip-allocations: eipalloc-0d8f2c0f17aeb24da,eipalloc-086f78e84e2130bde
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
    service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-003e7a14308b14be5,subnet-035046f571c2cf29e
    service.beta.kubernetes.io/aws-load-balancer-type: nlb

Vad1mo avatar Jun 08 '23 12:06 Vad1mo

Hello, any news about that ?

I deployed the 2.5.3 module. I have same issue. Subnets are tagged but always the same issue :

{"level":"error","ts":"2023-07-10T15:13:47Z","msg":"Reconciler error","controller":"ingress","object":{"name":"echoserver","namespace":"echoserver"},"namespace":"echoserver","name":"echoserver","reconcileID":"b24189f3-3168-4d12-b8a6-5a9d58ecfdaf","error":"couldn't auto-discover subnets: unable to resolve at least one subnet"}

{"level":"debug","ts":"2023-07-10T15:13:47Z","logger":"events","msg":"Failed build model due to couldn't auto-discover subnets: unable to resolve at least one subnet","type":"Warning","object":{"kind":"Ingress","namespace":"echoserver","name":"echoserver","uid":"f5830425-3d09-4c56-89d2-f9490bc657f7","apiVersion":"networking.k8s.io/v1","resourceVersion":"13075252"},"reason":"FailedBuildModel"}

Can we have more logs ?

EDIT : FIX, need to remove last space in the tag name

ktibi avatar Jul 10 '23 15:07 ktibi

I encountered the following error: "error":"unable to resolve at least one subnet". This issue was present for some of my services even though I properly configured the subnet discovery as per the official guide: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.5/deploy/subnet_discovery/. Only one of my service and load balancer was successfully created by the way.

I have managed to work around the issue by explicitly defining the subnets using service.beta.kubernetes.io/aws-load-balancer-subnets. However, I find it odd that auto-discovery didn't work. Here is my current working configuration:

service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-name: "xxx"
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-a, subnet-b,subnet-c

Thanks @Vad1mo for the hint.

If anyone could shed light on why the auto-discovery did not work, that would be very helpful.

  • helm chart version: v1.5.5
  • app version: v2.5.4

tommyasai avatar Jul 22 '23 06:07 tommyasai

For those encountering this error message, I would suggest looking in CloudTrail for the DescribeSubnets call that LBC is issuing.

johngmyers avatar Jul 23 '23 09:07 johngmyers

Hello.

thanks @johngmyers , I found a cloudtrail log. Don't know why but I need to add the tag kubernetes.io/role/elb + kubernetes.io/role/internal-elb.

Maybe now, kubernetes.io/role/elb is for all.

ktibi avatar Jul 24 '23 16:07 ktibi

@ktibi https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.5/deploy/subnet_discovery/

johngmyers avatar Jul 25 '23 01:07 johngmyers

EDIT (yellow duck) : When use internet-facing, need to add ELB tag on public subnet ! All works !

@johngmyers Yes I read the doc but the internal tag no working.

When I check cloudtrail, I can see the request :

"requestParameters": {
        "subnetSet": {},
        "filterSet": {
            "items": [
                {
                    "name": "vpc-id",
                    "valueSet": {
                        "items": [
                            {
                                "value": "vpc-XXXXXXXXXXXXXXX"
                            }
                        ]
                    }
                },
                {
                    "name": "tag:kubernetes.io/role/elb",
                    "valueSet": {
                        "items": [
                            {},
                            {
                                "value": "1"
                            }
                        ]
                    }
                }
            ]
        }
    },

EDIT (thanks for the yellow duck) : When we use internet-facing, need to add tag ELB on public subnet !!

But the ingress definition is :

  annotations:
    alb.ingress.kubernetes.io/certificate-arn: >-
      arn:aws:acm:eu-west-1:1XXXXXXXXX:certificate/XXXXXXXXXXXXXXXXXXXXXXXXXX
    alb.ingress.kubernetes.io/healthcheck-path: /service/rest/v1/status
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/ssl-redirect: '443'
    external-dns.alpha.kubernetes.io/hostname: XXXXX.XXXXXX.com
    kubernetes.io/ingress.class: alb
    meta.helm.sh/release-name: XXXX
    meta.helm.sh/release-namespace: default

ktibi avatar Jul 25 '23 08:07 ktibi

@ktibi that ingress definition clearly shows a alb.ingress.kubernetes.io/scheme: internet-facing annotation, so it is internet-facing.

johngmyers avatar Jul 25 '23 23:07 johngmyers

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 26 '24 15:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 25 '24 16:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 26 '24 17:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 26 '24 17:03 k8s-ci-robot