external-dns Google Cloud DNS provider does not react to credential changes

What happened:

When using JSON keys to allow external-dns' client for the GCP APIs to manage DNS entries, and swapping out the JSON key for another one at runtime, the provider does not re-read the file to re-authenticate, causing subsequent API calls to fail as unauthorized.

What you expected to happen:

The provider should re-authenticate.

How to reproduce it (as minimally and precisely as possible):

Run external-dns with a JSON key as described in the external-dns docs, everything works
Replace the current JSON key with a new one and revoke the old key
Observe external-dns being unable to manage DNS records due to invalid credentials

Anything else we need to know?:

I am aware of issue #852, which was auto-closed without resolution in 2019, my goal with this report is to re-raise it and go in depth about why this is a problem.

While there is a plethora of possible workarounds and documented alternatives to using static JSON keys, there are usecases which leave no alternative to their use, such as when running self-managed clusters on GCE instances (as in, not GKE, so no workload identity federation - e.g. Cluster API, k3s or Talos) without wanting to grant all cluster nodes permissions to manage DNS, or when running in a different cloud (or on-prem) environment entirely, but still wanting to manage DNS zones hosted on GCP.

We're currently using Vault to provision these JSON keys and leverage the Vault Agent as a sidecar for external-dns to write them to a filesystem location external-dns can access. When the credentials are inevitably rotated, the issue outlined above arises - external-dns keeps using the old, now revoked, credentials, and requests to the GCP API fail as unauthorized.

Our current workaround for this issue involves setting shareProcessNamespace: true on the Deployment's pod spec, and having a shell script forcibly kill the external-dns process on credential rotation from within the sidecar. While that works, it's also messy (in terms of metrics, alerting, and other potential side effects) and suboptimal from a security hardening perspective.

Furthermore, this workaround requires a sidecar for killing the external-dns process (or an alternative means to the same end, such as performing deployment rollout/-over) regardless of if you're using Vault or not, which means alternative methods of getting credentials into workloads, such as the Secrets Store CSI Driver, are currently not supported without reintroducing additional components like sidecars or rollout controllers.

All of this would be unnecessary, greatly enhancing the user experience, if credential changes were handled properly within external-dns itself.

Environment:

External-DNS version (use external-dns --version): 0.13.5
DNS provider: GCP
Others: Kubernetes v1.27.1

Jul 23 '23 14:07 itspngu

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 25 '24 01:01 k8s-triage-robot

/remove-lifecycle stale

Jan 25 '24 08:01 itspngu

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 24 '24 08:04 k8s-triage-robot

/remove-lifecycle stale

Apr 29 '24 08:04 itspngu

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jul 28 '24 08:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Aug 27 '24 08:08 k8s-triage-robot

/remove-lifecycle rotten

Aug 27 '24 11:08 itspngu

external-dns external-dns copied to clipboard

Google Cloud DNS provider does not react to credential changes

external-dns
external-dns copied to clipboard