external-dns
external-dns copied to clipboard
fix regression in NodePort (SRV) support, introduce feature flag & add Node label selector support
This commit fixes two main issues with the external-dns implementation of NodePort based DNS records:
(1) The earlier implementation of NodePort DNS created SRV records alongside the other DNS records created for NodePort. However, this functionality was broken by the introduction of a subsequent commit: d934f69#r32415167 in version v0.5.9 which updated the Planner's filterRecordsForPlan to only recognize CNAMEs and A records. This commit adds SRVs to the recognized records.
However, since no other service types get SRV records created for them and SRV support is not well documented in external-dns, this commit toggles the creation of SRV records using a flag which is disabled by default.
(2) Previously, the serviceSource would consider all nodes a target, regardless of their k8s role. This resulted in the master's public IP being added to the DNS record. Given that most k8s users do not use the master for running services/pods and reserve it for k8s API activity, this does not seem like the ideal default behavior. Instead, this commit gives the user the ability to specify what "role" label a node should have, then only selects nodes based on that role. This defaults to the "node" role, which by default refers to all non-master (i.e., worker) nodes in the cluster.
Since original patch by @LeeHampton https://github.com/kubernetes-sigs/external-dns/pull/1232
- Rollback positional argument --> struct change for function prototypes as suggested by @linki
- Modernize label selector a bit by making it fully configurable
- Improve coherence in the flags's naming
- Fix Cloudflare test
Tested within our clusters.
Welcome @Quentin-M!
It looks like this is your first PR to kubernetes-sigs/external-dns 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes-sigs/external-dns has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Fixes #936
/kind bug
@LeeHampton would you be able to test that this fix works as expected in your environment?
@seanmalloy Definitely! I'll get it hooked up on dev within the next day or two and let you know if I run into anything weird.
@seanmalloy && @Quentin-M I'm running into issues with this in my stack. The change to use the node-role.kubernetes.io/node
label instead of the kubernetes.io/role
label causes all of my existing records to get deleted (don't worry, I'm just testing in dev ;-)). My stack, which is on k8s 1.17 and managed by kops 1.17.1 does have the node-role.kubernetes.io/node
label on all nodes, but the value is empty, whereas this PR assumes its value will be worker
.
If I override the role value to be an empty string, the DNS record will be created pointing to all instances in the cluster including the masters, even though the masters in my cluster do not have a node-role.kubernetes.io/node
label. This is probably due to something in codebase that sets the value to an empty string if the label doesn't exist.
Is there a reason not to rely on the kubernetes.io/role
instead of the node-role.kubernetes.io/node
label? Is my kops stack just an outlier, and usually the node-role.kubernetes.io/node
has an actual value?
I'm wondering if it might make more sense to change the --node-port-node-role
flag to a key value pair so we have control over both the label and value we filter on. It could even be a list of key value pairs for more granularity, but that would require adding some functionality to the node lookup.
Yeah, I thought that may become an issue. The node-role.kubernetes.io
label is the community accepted label for defining node roles, and more broadly node-role.kubernetes.io/master=
for master nodes. Right now, we actually only can match node-role.kubernetes.io=<parameter>
. Making the full set configurable would be more flexible but may require a bunch of new logic. @LeeHampton Any idea re: suitable replacement implementation for nodeSelector := labels.Set{nodeRoleLabelKey: sc.nodePortNodeRole}
? Ideally we'd just use an upstream parser that converts a string into a labels.Set
, seems there may be some available. I won't have time to research today/tomorrow, maybe next week.
@Quentin-M makes sense. I opened a PR against your PR here. This implements a basic single label setup, which I tested against my cluster and works beautifully. However, I do like the idea of being able to pass in a comma separated list and convert it to a LabelSet, potentially using a simple upstream tool. This would address some real needs we have in our cluster. For example, we have some worker nodes that are dedicated to CICD, and it would be nice to leave them out of the A Record even though they can technically NAT to the proper destination pod.
I also will probably not have more time to dig into it this week, but I can take a pass at the multi-label implementation on Monday.
Thanks a lot @LeeHampton, that looks much more sensical already!
@Quentin-M Sent another PR your way that builds on this. Now you can use the full label selector syntax, and the parsing is handled by the k8s labels.ConvertSelectorToLabelsMap()
function. This allows you to either do a simple label selector like we had previously, or specify more complex selectors like environment in (production),tier in (frontend)
. I've tested this on my environment with a multi-label selector: kubernetes.io/role=node,system/role=users-paid-default
, and it works perfectly. Note, I changed the flag naming I had in there previously to be a bit less awkward.
Wow @LeeHampton, that's some skills! 👏 Thanks!
What can we do to get this reviewed and merged? Don't want this to fall into a languishing state again like it did last Fall. 😄 cc @Quentin-M @seanmalloy && (@linki || @njuettner) perhaps?
What can we do to get this reviewed and merged? Don't want this to fall into a languishing state again like it did last Fall. 😄 cc @Quentin-M @seanmalloy && (@linki || @njuettner) perhaps?
@LeeHampton I'll try to review it this week.
/assign
For me this PR also requires changes in pdns provider:
diff --git a/provider/pdns/pdns.go b/provider/pdns/pdns.go
index 8ecd9d82..fb5c2512 100644
--- a/provider/pdns/pdns.go
+++ b/provider/pdns/pdns.go
@@ -309,7 +309,7 @@ func (p *PDNSProvider) ConvertEndpointsToZones(eps []*endpoint.Endpoint, changet
// external-dns v5.0.0-alpha onwards
records := []pgo.Record{}
for _, t := range ep.Targets {
- if ep.RecordType == "CNAME" {
+ if ep.RecordType == "CNAME" || ep.RecordType == "SRV" {
t = provider.EnsureTrailingDot(t)
}
rebased and addressed @Prototik's comment.
/assign @raffo
(the bot told me to do that)
I won't be rebasing right now. I did it yesterday, and fixed conflicts, tests were passing, it has no impact on the reviewability of the project and will break tomorrow again.
@Quentin-M I added some inline comments let me know what you think. Thanks!
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Quentin-M, RobAtticus
To complete the pull request process, please assign raffo after the PR has been reviewed.
You can assign the PR to them by writing /assign @raffo
in a comment when ready.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
addressed comments on default role, rebased, passing all tests. not re-tested within live environment after default role change (yet).
/assign @Raffo
@Quentin-M: PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@Quentin-M Looks like you need another rebase?
@stefanlasiewski It appears so, a rebase was required just 24 hours after the last time I rebased it. Unfortunately, this PR is not getting traction/reviews at the moment, so low incentive to rebase it again.
Hey guys, I opened another PR that addresses the rebase stuff (#1886). Can we just merge that in since the basic functionality has already been vetted here? I've been running a build from this branch in production for months now.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten
@Quentin-M would you be willing to rebase and fix the merge conflicts? I'll try to review this one and get it pushed through if you fix the conflicts. Also, sorry for the delay in reviewing.
/remove-lifecycle rotten
/cc
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale