Failed to list hosted zones after updating to v0.15.0
I recently updated to external-dns v0.15.0 from v0.14.2 but I'm seeing this error.
This is my manifest
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: external-dns
namespace: external-dns
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: external-dns
rules:
- apiGroups: [""]
resources: ["services","endpoints","pods"]
verbs: ["get","watch","list"]
- apiGroups: ["extensions","networking.k8s.io","getambassador.io"]
resources: ["ingresses","hosts"]
verbs: ["get","watch","list"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["list","watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: external-dns-viewer
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: external-dns
subjects:
- kind: ServiceAccount
name: external-dns
namespace: external-dns
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: external-dns
namespace: external-dns
spec:
strategy:
type: Recreate
selector:
matchLabels:
app: external-dns
template:
metadata:
labels:
app: external-dns
spec:
serviceAccountName: external-dns
containers:
- name: external-dns
image: "registry.k8s.io/external-dns/external-dns:v0.15.0"
args:
- --source=service
- --source=ingress
- --domain-filter=my.hostedzone.net
- --provider=aws
- --aws-zone-type=private # only look at public hosted zones (valid values are public, private or no value for both)
- --registry=txt
- --txt-owner-id=my-eks-cluster
securityContext:
fsGroup: 65534 # For ExternalDNS to be able to read Kubernetes and AWS token files
Could you please advise?
I also tried Helm deployment through Terraform but was still hit with the same issue.
resource "helm_release" "external_dns" {
name = "external-dns"
repository = "https://kubernetes-sigs.github.io/external-dns"
chart = "external-dns"
create_namespace = true
namespace = "external-dns"
version = "1.15.0"
set {
name = "serviceAccount.name"
value = "external-dns"
}
set {
name = "domainFilters"
value = [var.route53_zone]
}
set {
name = "txtOwnerId"
value = data.aws_eks_cluster.cluster.name
}
}
It looks like your external-dns deployment is failing to reach https://route53.amazonaws.com to list hosted zones - I would take a double-check at your networking configuration. Perhaps a security group isn't allowing this traffic or something similar?
Is this something new in v0.15.0 because I have no issues with v0.14.2?
No, it should be unrelated, I would strongly suspect AWS networking misconfiguration. Are you able to exec into the external-dns pod? Do you get results similar to this?
❯ curl -vkL https://route53.amazonaws.com/2013-04-01/hostedzone
* Host route53.amazonaws.com:443 was resolved.
* IPv6: (none)
* IPv4: 54.239.31.187
* Trying 54.239.31.187:443...
* Connected to route53.amazonaws.com (54.239.31.187) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
* (304) (IN), TLS handshake, Server hello (2):
* (304) (OUT), TLS handshake, Client hello (1):
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-AES128-GCM-SHA256 / [blank] / UNDEF
* ALPN: server accepted http/1.1
* Server certificate:
* subject: CN=route53.amazonaws.com
* start date: Aug 31 00:00:00 2024 GMT
* expire date: Aug 13 23:59:59 2025 GMT
* issuer: C=US; O=Amazon; CN=Amazon RSA 2048 M01
* SSL certificate verify ok.
* using HTTP/1.x
> GET /2013-04-01/hostedzone HTTP/1.1
> Host: route53.amazonaws.com
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 403 Forbidden
< x-amzn-RequestId: 586c0b6d-d166-4996-951c-a3cea16c0629
< Content-Type: text/xml
< Content-Length: 297
< Date: Fri, 06 Dec 2024 23:00:22 GMT
<
<?xml version="1.0"?>
* Connection #0 to host route53.amazonaws.com left intact
<ErrorResponse xmlns="https://route53.amazonaws.com/doc/2013-04-01/"><Error><Type>Sender</Type><Code>MissingAuthenticationToken</Code><Message>Request is missing Authentication Token</Message></Error><RequestId>586c0b6d-d166-4996-951c-a3cea16c0629</RequestId></ErrorResponse>
Unfortunately, the container doesn't seem to have shell and curl installed.
I have some updates. It looks like this is only happening in the us-east-1 region. I have similar deployments in ap-southeast-1, eu-central-1, and ca-central-1, but they all came up normally.
/help /area provider/aws
Does it mean helm chart version 0.14.2 works with no issues in us-east-1? What about AWS IAM permissions? Any restrictions, it most likely safe to share IAM policy JSON.
Worth to write a simple service, deploy to us-east-1 and try to debug networking issues with it.
What about environment variables, not sure what the setup, but for STS, how AWS_ env vars looks like?
- name: AWS_STS_REGIONAL_ENDPOINTS
value: regional
- name: AWS_DEFAULT_REGION
value: eu-west-1
- name: AWS_REGION
value: eu-west-1
- name: AWS_ROLE_ARN
value: arn:aws:iam::xsdfasdfasdfasdf
@ivankatliarchuk: This request has been marked as needing help from a contributor.
Guidelines
Please ensure that the issue body includes answers to the following questions:
- Why are we solving this issue?
- To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
- Does this issue have zero to low barrier of entry?
- How can the assignee reach out to you for help?
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
/help /area provider/aws
Does it mean helm chart version
0.14.2works with no issues inus-east-1? What about AWS IAM permissions? Any restrictions, it most likely safe to share IAM policy JSON.Worth to write a simple service, deploy to
us-east-1and try to debug networking issues with it.What about environment variables, not sure what the setup, but for STS, how AWS_ env vars looks like?
- name: AWS_STS_REGIONAL_ENDPOINTS value: regional - name: AWS_DEFAULT_REGION value: eu-west-1 - name: AWS_REGION value: eu-west-1 - name: AWS_ROLE_ARN value: arn:aws:iam::xsdfasdfasdfasdf
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/help /area provider/aws
Does it mean helm chart version
0.14.2works with no issues inus-east-1? What about AWS IAM permissions? Any restrictions, it most likely safe to share IAM policy JSON.Worth to write a simple service, deploy to
us-east-1and try to debug networking issues with it.What about environment variables, not sure what the setup, but for STS, how AWS_ env vars looks like?
- name: AWS_STS_REGIONAL_ENDPOINTS value: regional - name: AWS_DEFAULT_REGION value: eu-west-1 - name: AWS_REGION value: eu-west-1 - name: AWS_ROLE_ARN value: arn:aws:iam::xsdfasdfasdfasdf
AWS_DEFAULT_REGION is the only env variable set when I deployed both 0.14.2 and 0.15.0 through helm chart. Please the attached role policy. I used the pod identity way to associate the role with the service account assigned to the pod.
Another finding is that if I incorrectly set the AWS_DEFAULT_REGION to us-east-1 in an AWS account servicing a different region (e.g. ap-southeast-2), the external-dns runs without the failed to list hosted zones error.
We are still seeing the error with the v0.18.0 release. Since this is a private VPC, is there any regional VPC endpoint required to be set up to allow access? Please advise.
AWS is one of the supported providers. Please either raise the question in the AWS SDK repository (https://github.com/aws/aws-sdk-go-v2/) or, if you’re an AWS paid customer, open a support case directly with AWS or hire an SRE to debug. They should be able to advise based on your infrastructure requirements.
From the ExternalDNS side, there is no visibility into your exact environment constraints (VPC layout, subnets, resolvers, security groups, ACLs, and so on), and the behavior ultimately depends on how the AWS SDK and your environemt are configured.