nsd icon indicating copy to clipboard operation
nsd copied to clipboard

Linux VRF support

Open julianbrost opened this issue 3 years ago • 8 comments

Linux VRFs allow having multiple routing tables and binding sockets to one of these individually. This is similar to SO_SETFIB on FreeBSD which is already implemented in nsd.

In order to bind a socket to a routing table, the SO_BINDTODEVICE socket option is used with a VRF device name as value. nsd already provides support for this option but does not allow specifying the device name but instead figures out the device name for a given address by itself. I have already built a proof of concept that extends the ip-address directive with an additional device keyword argument that allows changing this device name.

However, to fully support VRFs, it should be possible to specify the device name each time a socket is bound to address, this mainly affects outgoing-interface for selecting the routing domain to use for notifies, maybe it could also be useful for control-interface. As far as I can tell, it's also not possible to use setfib there at the moment.

Before putting more time into creating a PR for this, I'd like to know if there would be interest in this feature and if there's any general feedback so far.

julianbrost avatar Dec 29 '21 17:12 julianbrost

At RIPE NCC, we were looking at using Linux VRFs for our name servers, as an alternative to the policy-based routing we use now. However, we found that in order to use this feature, the software must co-operate. None of the existing DNS servers we use (BIND, Knot NSD, NSD) have this support. So we just dropped the idea for now. However, if there were support for VRFs, we might actually use this feature in the future.

anandb-ripencc avatar Jan 24 '22 13:01 anandb-ripencc

Hi @julianbrost, @anandb-ripencc!

Background: I'm currently working to get AF_XDP sockets supported in NSD and to do so I'm going to have to shuffle around the way that sockets are configured/opened. This is required because the way NSD currently works is to open both UDP and TCP sockets for every ip-address configured. With the addition of XDP this solution will no longer work because for XDP sockets, you'd open one for a netdev/queue combination. What I'll be doing is to allow users to specify the socket type with the ip-address option. e.g. ip-address: <address>[@<port>] [xdp|udp|tcp] <options>. For XDP sockets you'd then be able to specify something like ip-address: <interface> xdp queue=<queue> servers=<server(s)>. And you'd specify socket-server mappings as desired.

I'll have to look into the details of how VRFs work a bit, but I'm guessing the changes will allow for VRF support to be implemented without too much hassle. Since I'm planning to merge XDP support in stages, and the socket configuration changes are up first, I'll have a go at this too.

k0ekk0ek avatar Mar 18 '22 10:03 k0ekk0ek

Thanks for the update Jeroen!

I'd like to make a suggestion here. NSD doesn't really have a stable/development release model. Everything gets committed and released as production, and this has hurt in the past when breaking changes have appeared in new versions.

Are you able to internally discuss the whole versioning thing, and perhaps convince your colleagues to maintain 2 versions of NSD? Make the current 4.4.x branch the stable one. You can also release a 4.5.x branch, which could be a development branch. You would have more freedom to break things there, or rework some ideas if they don't work too well. Eventually, when you feel that it's ready, you can release 4.6.x as the next stable branch. This is how BIND is doing it, and it seems to work well.

If you don't want the even/odd distinction as with BIND, then you could do what Knot DNS does, where they maintain 2 releases, both classed as production. Currently, they have 3.0.x and 3.1.x as supported versions. The 3.0.x branch gets no feature changes. Only bug fixes are applied to it. For 3.1.x, they do add new features, but again, nothing that will break a production system. For really big breaking changes, they will do it in 3.2.x. When 3.2 is released, 3.0 will be abandoned, such that 3.1 becomes the previous stable, and gets only bug fixes, whereas 3.2 can get new features.

All this makes it very easy for operators to maintain stable services, while still able to play with new things using newer versions on test systems. With NSD this is just not possible. So could you please bring this line of reasoning to your colleagues and see if the NSD versioning can be improved? You could also consider this for Unbound.

anandb-ripencc avatar Mar 18 '22 11:03 anandb-ripencc

@anandb-ripencc, I believe there's two questions here?

  1. Will the configuration style break current configurations; and
  2. Can we (NLnet Labs) use a different strategy for maintaining stable branches to avoid breaking changes

As to the first question, specifying the socket type would be optional, it'd just fallback to udp+tcp if nothing is specified. I intend to ensure that we don't break any existing configuration files or introduce different behavior. Writing a couple of test cases is what I'd normally do here.

As to the second question: I see your point and I'll bring it up with the others to see how they feel about it. Depending on the urgency, maybe we should open a separate issue regarding release management? That way we can discuss in more detail and perhaps get others to join in too. If you don't mind though, let's keep focus on VRFs in this particular thread :slightly_smiling_face:

k0ekk0ek avatar Mar 18 '22 13:03 k0ekk0ek

Hi Jeroen,

I appreciate that you wouldn't break existing configuration files with your changes, but that wasn't my point. There will still be new code, with the potential of unexpected behaviour, bugs, etc. And this is why I am so concerned with better versioning and release management. But, I won't comment on it here further. When you discuss it with your colleagues, and wish to have user input, feel free to open a separate issue on GitHub and notify us via the mailing list, so we can discuss that issue separately.

anandb-ripencc avatar Mar 18 '22 13:03 anandb-ripencc

@julianbrost, I read the page you so kindly provided. To provide some background on why the SO_BINDTODEVICE and SO_SETFIB changes were merged was to increase throughput. Basically, to hook up a cpu core up to a dedicated nic to avoid cache misses. For Linux that's achievable through SO_BINDTODEVICE, but with FreeBSD that's only possible by specifying a dedicated routing table. That's also the reason it's not implemented for other interfaces, though we could opt to allow a selection of socket options to be specified on a per-interface base. (nsd.conf.sample included in the repo provides more detail on performance etc)

Before I (or you) go off and implement things, @julianbrost, @anandb-ripencc: can you describe the use-case a bit? I can imagine using multiple nics per dedicated cpu. i.e. if you have slower nics, use multiple to achieve a greater throughput? But that's from a performance point-of-view, maybe I'm missing some other obvious use-case?

k0ekk0ek avatar Mar 18 '22 15:03 k0ekk0ek

For me it's just that I'm playing around with VRFs for other services on a host that happens to also run a nameserver as well. Nothing that can't be solved without VRFs, VRF support would be nice to have as it would allow me to remove some extra policy-based routing workarounds that I have specifically for the nameserver. So I looked if any nameservers already support this, didn't find one and thus looked at the source and found that nsd is probably the easiest one to add this.

julianbrost avatar Mar 18 '22 15:03 julianbrost

Our use case has nothing to do with performance. We currently use policy routing to separate the management traffic and service traffic on a server. The first interface on a server is used for management (ssh, monitoring, etc). The second interface is connected to Internet exchanges or a host network. It receives DNS queries, and we want the DNS responses to go out the same interface. Policy routing works, so there is no reason to abandon it. But we recently looked at VRFs to see if we could use them. However, none of the name servers currently support them, so we just stuck to policy routing, and it is unlikely that we would use VRFs.

anandb-ripencc avatar Mar 18 '22 16:03 anandb-ripencc