external-dns
external-dns copied to clipboard
Manage multiple zones with single ExternalDNS deployment using CRDs
What would you like to be added:
I propose that ExternalDNS be extended with a distinct operational mode, where:
- Instead of managing a single global per-process provider+zone as global state, ExternalDNS would instead keep a set of top-level "DNS Zone binding" objects, each with their own provider/zone config, and would run a separate sync loop goroutine for each object;
- these zone-binding objects, instead of being configured on the command-line, would be configured by managing/watching a
DNSZoneBinding
CRD (making ExternalDNS into, effectively, a k8s operator.) - All provider-level and zone-level configuration (of the type currently fed in as CLI arguments for most providers) could be placed into
DNSZoneBinding
resources. - In this mode, only global concerns (e.g.
policy
,interval
) would be passed as CLI switches to the controller itself; and these could potentially still be overridden on a per-DNSZoneBinding
basis. - Ingress/Service/Endpoint resources could specify the
DNSZoneBinding
they intend to be interpreted against as an annotation (similar to cert-manager'scert-manager.io/issuer
annotation) -
DNSEndpoint
CRD resources could have a direct parent-child relationship with a DNSZoneBinding (e.g. generating anownerRef
, etc.)
A DNSZoneBinding
resource could contain a spec
with fields like:
-
provider
-
domain-filter
-
registry
-
txt-owner-id
,txt-prefix
, etc. -
captureEnvFrom
with asecretRef
orsecretKeyRef
, to attach things like provider API keys (note that this wouldn't translate 1:1 withenvFrom
on the deployment, as the env-vars specified in distinct zone-bindings for the same provider would need to be kept distinct and attached to the correct in-memory zone-binding object)
Why is this needed:
Right now, a separate deployment of ExternalDNS is needed for each provider+zone configuration.
For example, if I have a deployed project foo
with two namespaces, foo-prod
and foo-staging
, where foo-prod
contains an ingress with hostname foo.prod.example.com
and foo-staging
has a similar ingress with foo.staging.example.com
; and where prod.example.com
and staging.example.com
are separate zones (with distinct providers, or under distinct accounts in the same provider), then I need to deploy ExternalDNS twice, once for each namespace.
Obviously, as well, if there are multiple tenants in a k8s cluster, each of them must run their own ExternalDNS
deployment(s). With a lot of tenants, the overhead of this can add up!
I would much prefer that ExternalDNS adopts (or offers as an option) a model similar to cert-manager
, where there exists only a single cluster-wide controller deployment, which is then "virtualized" with controller-configuration resources (Issuer
and ClusterIssuer
resources, in cert-manager's case) that tell it the configuration it should use when working with the resources that use/reference that controller-configuration resource.
Conveniently, as ExternalDNS already watches Service/Ingress/Endpoint resources for changes, it already has all the mechanism in place required to watch these controller-configuration resources for changes.
+1, being able to reduce the number of external-dns deployments would be awesome. Additionally, being able to specify different keys per zone (like you mentioned in the CRD spec) would be a must-have so if a zone was overwhelming the DNS server, the key could be revoked.
This is a very interesting proposal @tsutsu!
@Raffo @seanmalloy do you have any specific thoughts on this?
This would be very helpful.
In our clusters we want to independently control the following variables:
- which domains are enabled
- which DNS providers (route53, gcp) are enabled
- which networking types are enabled (service, ingress, istio-virtualservice)
This is very helpful for the following cases:
- multiple domains are extra useful when we want to do some large scale change on our internal DNS scheme to accommodate some new dimension.
- DNS provider filtering allows us to filter records we don't want to publicly show.
- Networking types allow us to disable a feature in a network that shouldn't be enabled, or to slowly roll out new features.
Our method of this is a little wonky as it requires duplication in most cases, but allows users to opt in. We expose a series of annotations that a consumer must use to indicate how they want their records to be created. Imagine something like:
annotations:
traffic.company.com/dns.google.ingress: region1.company.com
Which gets filtered by the - --annotation-filter
argument.
The biggest drawback of this method is that we often duplicate values, or consumers are unaware of these requirements.
apiVersion: v1
kind: Service
metadata:
name: service1
namespace: ns1
annotations:
networking.gke.io/load-balancer-type: "Internal"
external-dns.alpha.kubernetes.io/hostname: service1.region1.company.com
traffic.internal.apexclearing.com/dns.google.service: region1.company.com
labels:
app: cloudsql-proxy
spec:
type: LoadBalancer
selector:
app: cloudsql-proxy
ports:
- port: 5432
targetPort: 5432
protocol: TCP
Our method of this is a little wonky as it requires duplication in most cases, but allows users to opt in. We expose a series of annotations that a consumer must use to indicate how they want their records to be created. Imagine something like:
Thanks for your input @wimo7083. Curious, how many instances of ExternalDNS are you running in parallel for your setup?
In most clusters we run between 2 and 6 instances. We used multiple clusters initially as a way to get around our lack of multi-tenancy, and as we add better controls we can consolidate clusters.
One of the problems we've found is that when we add a new dimension ex: $sub-service-$service.$region.$company.com
-> $sub-service.$service.$region.$company.com
is that the dimensions that would reduce the blast radius the most, add the most maintenance overhead.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
Ran across this as I'm trying to figure out the best way to setup a dual setup between CloudFlare and Route53.
Idea being that I want to be able to have name.production.company.net
on route53 and then www.product.com
on cloudflare and have cloudflare proxy to the other domain cleanly.
Hi @Kaelten ! As the issue state you'd currently have to create two instances of ExternalDNS for your situation. Each would have the respective provider configuration and domain filter set.
Although I am not sure what you mean by "proxy to the other domain". If this is just a CNAME you might be fine with a static configuration in Cloudfare (e.g. via terraform) and just an ExternalDNS for route53. If you are talking about an acutal reverse http proxy your setup question is beyond the scope of ExternalDNS (and this ticket).
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Running multiple instances is an acceptable workaround for some cases, but it would be nice if we could get away with a single instance supporting multiple configurations.
In addition to the above, adding support for namespace separation of different configurations would also be nice (eg. make sure that only authorized namespaces can use a specific zone in a multi-tenant cluster)
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
This kind of multi tenancy pattern would be really useful and IMHO align well with the use of the Gateway API.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Would love for this feature.
I have two hosted zones: public and private. Being able to apply changes to both hosted zones with a single deployment would be great.
Not a competition, but we have way more hosted zones and having a separate external-dns
instance for each is a pain. Not a major pain, but still a pain.
for me i can see the use when we operate a split dns type setup with an internal zone and external zone - we have a few edge cases where both zones need same data
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Another use case is when multiple managed zones are in different subscriptions of Azure or AWS accounts. So you need to assume roles, etc. cert-manager does it pretty well.