external-dns icon indicating copy to clipboard operation
external-dns copied to clipboard

Manage multiple zones with single ExternalDNS deployment using CRDs

Open tsutsu opened this issue 4 years ago • 29 comments

What would you like to be added:

I propose that ExternalDNS be extended with a distinct operational mode, where:

  • Instead of managing a single global per-process provider+zone as global state, ExternalDNS would instead keep a set of top-level "DNS Zone binding" objects, each with their own provider/zone config, and would run a separate sync loop goroutine for each object;
  • these zone-binding objects, instead of being configured on the command-line, would be configured by managing/watching a DNSZoneBinding CRD (making ExternalDNS into, effectively, a k8s operator.)
  • All provider-level and zone-level configuration (of the type currently fed in as CLI arguments for most providers) could be placed into DNSZoneBinding resources.
  • In this mode, only global concerns (e.g. policy, interval) would be passed as CLI switches to the controller itself; and these could potentially still be overridden on a per-DNSZoneBinding basis.
  • Ingress/Service/Endpoint resources could specify the DNSZoneBinding they intend to be interpreted against as an annotation (similar to cert-manager's cert-manager.io/issuer annotation)
  • DNSEndpoint CRD resources could have a direct parent-child relationship with a DNSZoneBinding (e.g. generating an ownerRef, etc.)

A DNSZoneBinding resource could contain a spec with fields like:

  • provider
  • domain-filter
  • registry
  • txt-owner-id, txt-prefix, etc.
  • captureEnvFrom with a secretRef or secretKeyRef, to attach things like provider API keys (note that this wouldn't translate 1:1 with envFrom on the deployment, as the env-vars specified in distinct zone-bindings for the same provider would need to be kept distinct and attached to the correct in-memory zone-binding object)

Why is this needed:

Right now, a separate deployment of ExternalDNS is needed for each provider+zone configuration.

For example, if I have a deployed project foo with two namespaces, foo-prod and foo-staging, where foo-prod contains an ingress with hostname foo.prod.example.com and foo-staging has a similar ingress with foo.staging.example.com; and where prod.example.com and staging.example.com are separate zones (with distinct providers, or under distinct accounts in the same provider), then I need to deploy ExternalDNS twice, once for each namespace.

Obviously, as well, if there are multiple tenants in a k8s cluster, each of them must run their own ExternalDNS deployment(s). With a lot of tenants, the overhead of this can add up!

I would much prefer that ExternalDNS adopts (or offers as an option) a model similar to cert-manager, where there exists only a single cluster-wide controller deployment, which is then "virtualized" with controller-configuration resources (Issuer and ClusterIssuer resources, in cert-manager's case) that tell it the configuration it should use when working with the resources that use/reference that controller-configuration resource.

Conveniently, as ExternalDNS already watches Service/Ingress/Endpoint resources for changes, it already has all the mechanism in place required to watch these controller-configuration resources for changes.

tsutsu avatar Feb 15 '21 18:02 tsutsu

+1, being able to reduce the number of external-dns deployments would be awesome. Additionally, being able to specify different keys per zone (like you mentioned in the CRD spec) would be a must-have so if a zone was overwhelming the DNS server, the key could be revoked.

rumstead avatar Mar 10 '21 15:03 rumstead

This is a very interesting proposal @tsutsu!

@Raffo @seanmalloy do you have any specific thoughts on this?

sgreene570 avatar Jun 03 '21 19:06 sgreene570

This would be very helpful.

In our clusters we want to independently control the following variables:

  • which domains are enabled
  • which DNS providers (route53, gcp) are enabled
  • which networking types are enabled (service, ingress, istio-virtualservice)

This is very helpful for the following cases:

  • multiple domains are extra useful when we want to do some large scale change on our internal DNS scheme to accommodate some new dimension.
  • DNS provider filtering allows us to filter records we don't want to publicly show.
  • Networking types allow us to disable a feature in a network that shouldn't be enabled, or to slowly roll out new features.

Our method of this is a little wonky as it requires duplication in most cases, but allows users to opt in. We expose a series of annotations that a consumer must use to indicate how they want their records to be created. Imagine something like:

annotations:
  traffic.company.com/dns.google.ingress: region1.company.com

Which gets filtered by the - --annotation-filter argument.

The biggest drawback of this method is that we often duplicate values, or consumers are unaware of these requirements.

apiVersion: v1
kind: Service
metadata:
  name: service1
  namespace: ns1
  annotations:
    networking.gke.io/load-balancer-type: "Internal"
    external-dns.alpha.kubernetes.io/hostname: service1.region1.company.com
    traffic.internal.apexclearing.com/dns.google.service: region1.company.com
  labels:
    app: cloudsql-proxy
spec:
  type: LoadBalancer
  selector:
    app: cloudsql-proxy
  ports:
    - port: 5432
      targetPort: 5432
      protocol: TCP

apex-omontgomery avatar Jul 15 '21 16:07 apex-omontgomery

Our method of this is a little wonky as it requires duplication in most cases, but allows users to opt in. We expose a series of annotations that a consumer must use to indicate how they want their records to be created. Imagine something like:

Thanks for your input @wimo7083. Curious, how many instances of ExternalDNS are you running in parallel for your setup?

sgreene570 avatar Jul 15 '21 16:07 sgreene570

In most clusters we run between 2 and 6 instances. We used multiple clusters initially as a way to get around our lack of multi-tenancy, and as we add better controls we can consolidate clusters.

One of the problems we've found is that when we add a new dimension ex: $sub-service-$service.$region.$company.com -> $sub-service.$service.$region.$company.com is that the dimensions that would reduce the blast radius the most, add the most maintenance overhead.

apex-omontgomery avatar Jul 15 '21 17:07 apex-omontgomery

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 13 '21 18:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Nov 12 '21 19:11 k8s-triage-robot

/remove-lifecycle rotten

seanmalloy avatar Nov 12 '21 20:11 seanmalloy

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 10 '22 21:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Mar 12 '22 21:03 k8s-triage-robot

/remove-lifecycle rotten

mboutet avatar Mar 28 '22 18:03 mboutet

Ran across this as I'm trying to figure out the best way to setup a dual setup between CloudFlare and Route53.

Idea being that I want to be able to have name.production.company.net on route53 and then www.product.com on cloudflare and have cloudflare proxy to the other domain cleanly.

Kaelten avatar Apr 04 '22 23:04 Kaelten

Hi @Kaelten ! As the issue state you'd currently have to create two instances of ExternalDNS for your situation. Each would have the respective provider configuration and domain filter set.

Although I am not sure what you mean by "proxy to the other domain". If this is just a CNAME you might be fine with a static configuration in Cloudfare (e.g. via terraform) and just an ExternalDNS for route53. If you are talking about an acutal reverse http proxy your setup question is beyond the scope of ExternalDNS (and this ticket).

henninge avatar Apr 05 '22 05:04 henninge

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 04 '22 06:07 k8s-triage-robot

/remove-lifecycle stale

DerEnderKeks avatar Jul 04 '22 07:07 DerEnderKeks

Running multiple instances is an acceptable workaround for some cases, but it would be nice if we could get away with a single instance supporting multiple configurations.

In addition to the above, adding support for namespace separation of different configurations would also be nice (eg. make sure that only authorized namespaces can use a specific zone in a multi-tenant cluster)

sagikazarmark avatar Sep 23 '22 09:09 sagikazarmark

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 22 '22 10:12 k8s-triage-robot

/remove-lifecycle stale

DerEnderKeks avatar Dec 22 '22 15:12 DerEnderKeks

This kind of multi tenancy pattern would be really useful and IMHO align well with the use of the Gateway API.

stevehipwell avatar Feb 13 '23 22:02 stevehipwell

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 14 '23 23:05 k8s-triage-robot

/remove-lifecycle stale

DerEnderKeks avatar May 15 '23 06:05 DerEnderKeks

Would love for this feature.

I have two hosted zones: public and private. Being able to apply changes to both hosted zones with a single deployment would be great.

sabinayakc avatar Jul 19 '23 18:07 sabinayakc

Not a competition, but we have way more hosted zones and having a separate external-dns instance for each is a pain. Not a major pain, but still a pain.

ktamas avatar Jul 20 '23 08:07 ktamas

for me i can see the use when we operate a split dns type setup with an internal zone and external zone - we have a few edge cases where both zones need same data

anthonysomerset avatar Aug 02 '23 14:08 anthonysomerset

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 25 '24 17:01 k8s-triage-robot

/remove-lifecycle stale

rumstead avatar Jan 25 '24 17:01 rumstead

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 24 '24 17:04 k8s-triage-robot

/remove-lifecycle stale

DerEnderKeks avatar Apr 24 '24 18:04 DerEnderKeks

Another use case is when multiple managed zones are in different subscriptions of Azure or AWS accounts. So you need to assume roles, etc. cert-manager does it pretty well.

nikolaiderzhak avatar May 10 '24 07:05 nikolaiderzhak