external-dns icon indicating copy to clipboard operation
external-dns copied to clipboard

fix regression in NodePort (SRV) support, introduce feature flag & add Node label selector support

Open Quentin-M opened this issue 4 years ago • 40 comments

This commit fixes two main issues with the external-dns implementation of NodePort based DNS records:

(1) The earlier implementation of NodePort DNS created SRV records alongside the other DNS records created for NodePort. However, this functionality was broken by the introduction of a subsequent commit: d934f69#r32415167 in version v0.5.9 which updated the Planner's filterRecordsForPlan to only recognize CNAMEs and A records. This commit adds SRVs to the recognized records.

However, since no other service types get SRV records created for them and SRV support is not well documented in external-dns, this commit toggles the creation of SRV records using a flag which is disabled by default.

(2) Previously, the serviceSource would consider all nodes a target, regardless of their k8s role. This resulted in the master's public IP being added to the DNS record. Given that most k8s users do not use the master for running services/pods and reserve it for k8s API activity, this does not seem like the ideal default behavior. Instead, this commit gives the user the ability to specify what "role" label a node should have, then only selects nodes based on that role. This defaults to the "node" role, which by default refers to all non-master (i.e., worker) nodes in the cluster.


Since original patch by @LeeHampton https://github.com/kubernetes-sigs/external-dns/pull/1232

  • Rollback positional argument --> struct change for function prototypes as suggested by @linki
  • Modernize label selector a bit by making it fully configurable
  • Improve coherence in the flags's naming
  • Fix Cloudflare test

Tested within our clusters.

Quentin-M avatar Aug 19 '20 19:08 Quentin-M

Welcome @Quentin-M!

It looks like this is your first PR to kubernetes-sigs/external-dns 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/external-dns has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot avatar Aug 19 '20 19:08 k8s-ci-robot

Fixes #936

/kind bug

seanmalloy avatar Aug 20 '20 04:08 seanmalloy

@LeeHampton would you be able to test that this fix works as expected in your environment?

seanmalloy avatar Aug 20 '20 04:08 seanmalloy

@seanmalloy Definitely! I'll get it hooked up on dev within the next day or two and let you know if I run into anything weird.

LeeHampton avatar Aug 20 '20 15:08 LeeHampton

@seanmalloy && @Quentin-M I'm running into issues with this in my stack. The change to use the node-role.kubernetes.io/node label instead of the kubernetes.io/role label causes all of my existing records to get deleted (don't worry, I'm just testing in dev ;-)). My stack, which is on k8s 1.17 and managed by kops 1.17.1 does have the node-role.kubernetes.io/node label on all nodes, but the value is empty, whereas this PR assumes its value will be worker.

If I override the role value to be an empty string, the DNS record will be created pointing to all instances in the cluster including the masters, even though the masters in my cluster do not have a node-role.kubernetes.io/node label. This is probably due to something in codebase that sets the value to an empty string if the label doesn't exist.

Is there a reason not to rely on the kubernetes.io/role instead of the node-role.kubernetes.io/node label? Is my kops stack just an outlier, and usually the node-role.kubernetes.io/node has an actual value?

LeeHampton avatar Aug 20 '20 19:08 LeeHampton

I'm wondering if it might make more sense to change the --node-port-node-role flag to a key value pair so we have control over both the label and value we filter on. It could even be a list of key value pairs for more granularity, but that would require adding some functionality to the node lookup.

LeeHampton avatar Aug 20 '20 19:08 LeeHampton

Yeah, I thought that may become an issue. The node-role.kubernetes.io label is the community accepted label for defining node roles, and more broadly node-role.kubernetes.io/master= for master nodes. Right now, we actually only can match node-role.kubernetes.io=<parameter>. Making the full set configurable would be more flexible but may require a bunch of new logic. @LeeHampton Any idea re: suitable replacement implementation for nodeSelector := labels.Set{nodeRoleLabelKey: sc.nodePortNodeRole}? Ideally we'd just use an upstream parser that converts a string into a labels.Set, seems there may be some available. I won't have time to research today/tomorrow, maybe next week.

Quentin-M avatar Aug 20 '20 20:08 Quentin-M

@Quentin-M makes sense. I opened a PR against your PR here. This implements a basic single label setup, which I tested against my cluster and works beautifully. However, I do like the idea of being able to pass in a comma separated list and convert it to a LabelSet, potentially using a simple upstream tool. This would address some real needs we have in our cluster. For example, we have some worker nodes that are dedicated to CICD, and it would be nice to leave them out of the A Record even though they can technically NAT to the proper destination pod.

I also will probably not have more time to dig into it this week, but I can take a pass at the multi-label implementation on Monday.

LeeHampton avatar Aug 20 '20 21:08 LeeHampton

Thanks a lot @LeeHampton, that looks much more sensical already!

Quentin-M avatar Aug 20 '20 21:08 Quentin-M

@Quentin-M Sent another PR your way that builds on this. Now you can use the full label selector syntax, and the parsing is handled by the k8s labels.ConvertSelectorToLabelsMap() function. This allows you to either do a simple label selector like we had previously, or specify more complex selectors like environment in (production),tier in (frontend). I've tested this on my environment with a multi-label selector: kubernetes.io/role=node,system/role=users-paid-default, and it works perfectly. Note, I changed the flag naming I had in there previously to be a bit less awkward.

LeeHampton avatar Aug 21 '20 16:08 LeeHampton

Wow @LeeHampton, that's some skills! 👏 Thanks!

Quentin-M avatar Aug 21 '20 17:08 Quentin-M

What can we do to get this reviewed and merged? Don't want this to fall into a languishing state again like it did last Fall. 😄 cc @Quentin-M @seanmalloy && (@linki || @njuettner) perhaps?

LeeHampton avatar Aug 31 '20 20:08 LeeHampton

What can we do to get this reviewed and merged? Don't want this to fall into a languishing state again like it did last Fall. 😄 cc @Quentin-M @seanmalloy && (@linki || @njuettner) perhaps?

@LeeHampton I'll try to review it this week.

/assign

seanmalloy avatar Aug 31 '20 22:08 seanmalloy

For me this PR also requires changes in pdns provider:

diff --git a/provider/pdns/pdns.go b/provider/pdns/pdns.go
index 8ecd9d82..fb5c2512 100644
--- a/provider/pdns/pdns.go
+++ b/provider/pdns/pdns.go
@@ -309,7 +309,7 @@ func (p *PDNSProvider) ConvertEndpointsToZones(eps []*endpoint.Endpoint, changet
                                // external-dns v5.0.0-alpha onwards
                                records := []pgo.Record{}
                                for _, t := range ep.Targets {
-                                       if ep.RecordType == "CNAME" {
+                                       if ep.RecordType == "CNAME" || ep.RecordType == "SRV" {
                                                t = provider.EnsureTrailingDot(t)
                                        }

Prototik avatar Sep 07 '20 15:09 Prototik

rebased and addressed @Prototik's comment.

Quentin-M avatar Sep 10 '20 02:09 Quentin-M

/assign @raffo

(the bot told me to do that)

LeeHampton avatar Sep 10 '20 14:09 LeeHampton

I won't be rebasing right now. I did it yesterday, and fixed conflicts, tests were passing, it has no impact on the reviewability of the project and will break tomorrow again.

Quentin-M avatar Sep 10 '20 17:09 Quentin-M

@Quentin-M I added some inline comments let me know what you think. Thanks!

seanmalloy avatar Sep 22 '20 19:09 seanmalloy

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Quentin-M, RobAtticus To complete the pull request process, please assign raffo after the PR has been reviewed. You can assign the PR to them by writing /assign @raffo in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Oct 21 '20 05:10 k8s-ci-robot

addressed comments on default role, rebased, passing all tests. not re-tested within live environment after default role change (yet).

Quentin-M avatar Oct 21 '20 05:10 Quentin-M

/assign @Raffo

LeeHampton avatar Oct 21 '20 13:10 LeeHampton

@Quentin-M: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Oct 21 '20 13:10 k8s-ci-robot

@Quentin-M Looks like you need another rebase?

stefanlasiewski avatar Nov 17 '20 00:11 stefanlasiewski

@stefanlasiewski It appears so, a rebase was required just 24 hours after the last time I rebased it. Unfortunately, this PR is not getting traction/reviews at the moment, so low incentive to rebase it again.

Quentin-M avatar Nov 18 '20 22:11 Quentin-M

Hey guys, I opened another PR that addresses the rebase stuff (#1886). Can we just merge that in since the basic functionality has already been vetted here? I've been running a build from this branch in production for months now.

LeeHampton avatar Dec 08 '20 18:12 LeeHampton

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Mar 08 '21 19:03 fejta-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten

fejta-bot avatar Apr 07 '21 19:04 fejta-bot

@Quentin-M would you be willing to rebase and fix the merge conflicts? I'll try to review this one and get it pushed through if you fix the conflicts. Also, sorry for the delay in reviewing.

/remove-lifecycle rotten

seanmalloy avatar Apr 08 '21 03:04 seanmalloy

/cc

seanmalloy avatar Apr 08 '21 03:04 seanmalloy

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Jul 07 '21 04:07 fejta-bot