aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard
unable to resolve at least one subnet
Describe the bug
i have installed aws load balancer controller using helm chart. However, i am getting the below error.
{"level":"error","ts":"2023-05-26T08:35:07Z","msg":"Reconciler error","controller":"service","object":{"name":"grpc-ingressgateway","namespace":"istio-system"},"namespace":"istio-system","name":"grpc-ingressgateway","reconcileID":"3e7af4a6-6605-4596-84dc-8c0ce70032c2","error":"unable to resolve at least one subnet"}
Please note that i have the tags in all of my subnets(both private and public)
Also, ALB controller points to the correct cluster name and VPC.
containers:
- args:
- --cluster-name=intr-dev-eks-eu-west-1
- --ingress-class=alb
- --aws-region=eu-west-1
- --aws-vpc-id=vpc-0b3f9a142f4370b89
image: public.ecr.aws/eks/aws-load-balancer-controller:v2.5.2
But still its not discovering the subnets. Not sure what is the issue. Please help
Steps to reproduce Install AWS LOAD BALANCER CONTROLLER
Expected outcome A concise description of what you expected to happen.
Environment
- AWS Load Balancer controller version v2.5.2
- Kubernetes version 1.26
- Using EKS (yes/no), if so version? yes, 1.26
Additional Context:
@pranavnateri can you please double check if these subnets are under the vpc vpc-0b3f9a142f4370b89 as specified in your args. Also the available IP address should be more than 8 in your subnets.
Ref: https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/release-2.5/pkg/networking/subnet_resolver.go#LL442C3-L442C3
I met the same problem. Does any one have some idea?
Same issue.
We are using the NLB mode.
{"level":"error","ts":"2023-06-02T15:15:17Z","msg":"Reconciler error","controller":"service","object":{"name":"ingress-nginx-c8d74760-controller","namespace":"ingress-nginx"},"namespace":"ingress-nginx","name":"ingress-nginx-c8d74760-controller","reconcileID":"1fe38c11-f1bc-429b-8746-0e1dadcb5570","error":"unable to resolve at least one subnet"}
For reference, it was working as expected with Chart 1.4.8 and app 2.4.7
I updated to Chart 1.5.3 and app 2.5.2 and also:
- deleted nginx ingress service (it referenced to loadbalancer aws-nlb)
- recreate new nginx ingress service with one additional annotation.
service.beta.kubernetes.io/aws-load-balancer-eip-allocations=eipalloc-066f2201dcfbd6c5d,eipalloc-0dc581f069762b3c3
Update 1
- Rollback to app version 2.4.7
- Still the same issue.
- fyi. Each Subnet has more than 1000 free IPs
Update 2
- Removed the annotation
service.beta.kubernetes.io/aws-load-balancer-eip-allocations, - recreated service
- Everything working again.
fyi: The NLB is across 2 zones (we used 2 EIPs), and I would like to use my EIPs eipalloc-066f2201dcfbd6c5d, eipalloc-0dc581f069762b3c3
Update 3
- Specifying the annotation (Note the space after coma)
- Recreate service
- Fail
service.beta.kubernetes.io/aws-load-balancer-eip-allocations=eipalloc-066f2201dcfbd6c5d, eipalloc-0dc581f069762b3c3
Yields the following error message:
Error syncing load balancer: failed to ensure load balancer: error creating load balancer: "AllocationIdNotFound:
The allocation ID ' eipalloc-0dc581f069762b3c3' does not exist (Service: AmazonEC2;
Status Code: 400; Error Code: InvalidAllocationID.NotFound; Request ID: 9e4a1461-3f1d-4038-a5a0-9b788d1c7d7a;
Proxy: null)\n\tstatus code: 400, request id: cbac36a3-addd-4cd0-a085-7c5b71459c5c"
this seems to be related to https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2412#issuecomment-1006348720
Update 4
- Deleted the ingress service that pointed to the NLB,
- installed alb 2.5.3 from chart 1.5.3
- deployed nginx ingress again with Annotation
service.beta.kubernetes.io/aws-load-balancer-eip-allocations=eipalloc-066f2201dcfbd6c5d,eipalloc-0dc581f069762b3c3 - Works
I have no explanation why it just works, it was the same process as initially.
However, now my EIPs are used as expected. eipalloc-066f2201dcfbd6c5d,eipalloc-0dc581f069762b3c3
@Vad1mo Thanks for the detailed info. The error log indicates the controller failed to resolve subnets, so I think it might be some transition issue when you deleted and recreated the NLB as initially. The best practice, as suggested in our live doc, would be assigning the same number of subnets via annotation service.beta.kubernetes.io/aws-load-balancer-subnets, as the number of EIPs assigned. For more reference you can check: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.5/guide/service/annotations/#eip-allocations
I had the same problem. Brand new EKS cluster with chart 1.5.3 and getting "unable to resolve at least one subnet".
I did not have the required tags for auto discovery as documented here https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.5/deploy/subnet_discovery/
After setting the required tags I deleted the pods to force a reload and the NLBs were created without issue.
@oliviassss I am wondering why autodiscovery isn't working in that case, we have 2 public and 2 private subnets. All nodes are in the private subnets. IMHO it should work, why the need to add the subnets as well?
Here is my conclusion, after trying out all the different constellations
- The functionality is broken or
- The documentation is missing a critical part.
In my previous comment, https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/3212#issuecomment-1573920208 iterated a few options. It is all very flaky and unreliable.
Constellation #1
As advised, I added to the ingress service these annotations:
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-eip-allocations: eipalloc-0d8f2c0f17aeb24da,eipalloc-086f78e84e2130bde
service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-003e7a14308b14be5,subnet-035046f571c2cf29e
service.beta.kubernetes.io/aws-load-balancer-type: nlb
Error Message:
EIP allocations can only be set for internet facing load balancers
Both subnets subnet-003e7a14308b14be5,subnet-035046f571c2cf29e are public.
No idea why AWS is trying to create an intranet NLB, given public EIPs and public Subnets.
Constellation #2
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-eip-allocations: eipalloc-0d8f2c0f17aeb24da,eipalloc-086f78e84e2130bde
# service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-003e7a14308b14be5,subnet-035046f571c2cf29e
service.beta.kubernetes.io/aws-load-balancer-type: nlb
Error Message:
unable to resolve at least one subnet
Constellation #3
Added tag to the public subnets as suggested here and in subnet discovery
kubernetes.io/role/elb
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-eip-allocations: eipalloc-0d8f2c0f17aeb24da,eipalloc-086f78e84e2130bde
# service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-003e7a14308b14be5,subnet-035046f571c2cf29e
service.beta.kubernetes.io/aws-load-balancer-type: nlb
Error Message:
unable to resolve at least one subnet
We found out what caused the issue: It was a combination configuration change with the breaking change in the application v. 2.5
It seems now to be mandatory to set:
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
Here is the final configuration that works reliably
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-eip-allocations: eipalloc-0d8f2c0f17aeb24da,eipalloc-086f78e84e2130bde
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-003e7a14308b14be5,subnet-035046f571c2cf29e
service.beta.kubernetes.io/aws-load-balancer-type: nlb
Hello, any news about that ?
I deployed the 2.5.3 module. I have same issue. Subnets are tagged but always the same issue :
{"level":"error","ts":"2023-07-10T15:13:47Z","msg":"Reconciler error","controller":"ingress","object":{"name":"echoserver","namespace":"echoserver"},"namespace":"echoserver","name":"echoserver","reconcileID":"b24189f3-3168-4d12-b8a6-5a9d58ecfdaf","error":"couldn't auto-discover subnets: unable to resolve at least one subnet"}
{"level":"debug","ts":"2023-07-10T15:13:47Z","logger":"events","msg":"Failed build model due to couldn't auto-discover subnets: unable to resolve at least one subnet","type":"Warning","object":{"kind":"Ingress","namespace":"echoserver","name":"echoserver","uid":"f5830425-3d09-4c56-89d2-f9490bc657f7","apiVersion":"networking.k8s.io/v1","resourceVersion":"13075252"},"reason":"FailedBuildModel"}
Can we have more logs ?
EDIT : FIX, need to remove last space in the tag name
I encountered the following error: "error":"unable to resolve at least one subnet". This issue was present for some of my services even though I properly configured the subnet discovery as per the official guide: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.5/deploy/subnet_discovery/. Only one of my service and load balancer was successfully created by the way.
I have managed to work around the issue by explicitly defining the subnets using service.beta.kubernetes.io/aws-load-balancer-subnets. However, I find it odd that auto-discovery didn't work. Here is my current working configuration:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-name: "xxx"
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-a, subnet-b,subnet-c
Thanks @Vad1mo for the hint.
If anyone could shed light on why the auto-discovery did not work, that would be very helpful.
- helm chart version: v1.5.5
- app version: v2.5.4
For those encountering this error message, I would suggest looking in CloudTrail for the DescribeSubnets call that LBC is issuing.
Hello.
thanks @johngmyers , I found a cloudtrail log. Don't know why but I need to add the tag kubernetes.io/role/elb + kubernetes.io/role/internal-elb.
Maybe now, kubernetes.io/role/elb is for all.
@ktibi https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.5/deploy/subnet_discovery/
EDIT (yellow duck) : When use internet-facing, need to add ELB tag on public subnet ! All works !
@johngmyers Yes I read the doc but the internal tag no working.
When I check cloudtrail, I can see the request :
"requestParameters": {
"subnetSet": {},
"filterSet": {
"items": [
{
"name": "vpc-id",
"valueSet": {
"items": [
{
"value": "vpc-XXXXXXXXXXXXXXX"
}
]
}
},
{
"name": "tag:kubernetes.io/role/elb",
"valueSet": {
"items": [
{},
{
"value": "1"
}
]
}
}
]
}
},
EDIT (thanks for the yellow duck) : When we use internet-facing, need to add tag ELB on public subnet !!
But the ingress definition is :
annotations:
alb.ingress.kubernetes.io/certificate-arn: >-
arn:aws:acm:eu-west-1:1XXXXXXXXX:certificate/XXXXXXXXXXXXXXXXXXXXXXXXXX
alb.ingress.kubernetes.io/healthcheck-path: /service/rest/v1/status
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/ssl-redirect: '443'
external-dns.alpha.kubernetes.io/hostname: XXXXX.XXXXXX.com
kubernetes.io/ingress.class: alb
meta.helm.sh/release-name: XXXX
meta.helm.sh/release-namespace: default
@ktibi that ingress definition clearly shows a alb.ingress.kubernetes.io/scheme: internet-facing annotation, so it is internet-facing.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.