external-dns
external-dns copied to clipboard
Google Cloud DNS provider does not react to credential changes
What happened:
When using JSON keys to allow external-dns' client for the GCP APIs to manage DNS entries, and swapping out the JSON key for another one at runtime, the provider does not re-read the file to re-authenticate, causing subsequent API calls to fail as unauthorized.
What you expected to happen:
The provider should re-authenticate.
How to reproduce it (as minimally and precisely as possible):
- Run external-dns with a JSON key as described in the external-dns docs, everything works
- Replace the current JSON key with a new one and revoke the old key
- Observe external-dns being unable to manage DNS records due to invalid credentials
Anything else we need to know?:
I am aware of issue #852, which was auto-closed without resolution in 2019, my goal with this report is to re-raise it and go in depth about why this is a problem.
While there is a plethora of possible workarounds and documented alternatives to using static JSON keys, there are usecases which leave no alternative to their use, such as when running self-managed clusters on GCE instances (as in, not GKE, so no workload identity federation - e.g. Cluster API, k3s or Talos) without wanting to grant all cluster nodes permissions to manage DNS, or when running in a different cloud (or on-prem) environment entirely, but still wanting to manage DNS zones hosted on GCP.
We're currently using Vault to provision these JSON keys and leverage the Vault Agent as a sidecar for external-dns to write them to a filesystem location external-dns can access. When the credentials are inevitably rotated, the issue outlined above arises - external-dns keeps using the old, now revoked, credentials, and requests to the GCP API fail as unauthorized.
Our current workaround for this issue involves setting shareProcessNamespace: true on the Deployment's pod spec, and having a shell script forcibly kill the external-dns process on credential rotation from within the sidecar. While that works, it's also messy (in terms of metrics, alerting, and other potential side effects) and suboptimal from a security hardening perspective.
Furthermore, this workaround requires a sidecar for killing the external-dns process (or an alternative means to the same end, such as performing deployment rollout/-over) regardless of if you're using Vault or not, which means alternative methods of getting credentials into workloads, such as the Secrets Store CSI Driver, are currently not supported without reintroducing additional components like sidecars or rollout controllers.
All of this would be unnecessary, greatly enhancing the user experience, if credential changes were handled properly within external-dns itself.
Environment:
- External-DNS version (use
external-dns --version): 0.13.5 - DNS provider: GCP
- Others: Kubernetes v1.27.1
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten