Can't handle "Additional" section
I have a router with built-in DNS server capabilities. It's behaving in the following way:
- When it's queried for a name that has multiple IP addresses, it returns all of them in the ANSWER section.
- When it's queried again in the following few seconds, it returns just a single answer in the ADDITIONAL section.
Now, I'm not sure why, but when an answer contains multiple responses, nsncd seems to repeat the query. If the DNS server responds again with multiple responses, everything is fine, and nsncd answers on its front-end socket with a single IP address. But when I use the DNS server described above, it returns a strange response upon receiving the upstream strange response:
For comparison, here's a good response with google.com's IP address (it seems to always return just one IP address from both my strange DNS server, as well as 8.8.8.8):
I don't know much about rust, neither about nsncd in particular. I'm a NixOS user with access to this peculiar DNS server for only a few days. So anyone debugging this please do let me know if you want me to provide additional information.
@flokli you might want to know about this as the maintainer of nsncd as the new default nameserver daemon in NixOS
Just to confirm, did you check how glibc-nscd behaves?
@flokli good suggestion. Just checked and it returns almost exactly the same response upon receiving the "Additional" response, only with a single bit difference to nsncd's output:
This actually ends up being interpreted as "Unknown host" by the downstream - I'm checking with arp which presumably calls getaddrinfo. With that bit set by nsncd, it gives me "Unknown server error".
The other difference with glibc-nscd is that it doesn't repeat the DNS query if there are multiple answers in it, and just forwards those addresses to its output. So it doesn't trigger my DNS server's unusual behavior as readily.
I still wonder whether that "Additional" response is spec compliant.
I also wonder whether it would make sense for nsncd to look into the ADDITIONAL section - to me it looks like it wouldn't hurt. In my case the verbatim answer to my query is there, and only there.
I don't know DNS well enough to have an opinion here but if someone wants to put up a PR that adds and passes tests I can look at the rust code and maybe try to put more cursed (DNS) knowledge in my brain
The same issue pops up for more people:
https://discourse.nixos.org/t/occasional-dns-problems/35824
https://discourse.nixos.org/t/something-is-footgunning-around-dns-lookups/41368/4
These both seem like the exact same issue: answer is in ADDITIONAL section and nsncd doesn't parse it.
I have no clue what CLASS1232 OPT is supposed to mean though.
Some breadcrumbs on the internet suggest it might be EDNS0. I however don't fully understand yet what happens over the wire.
I have the distinct suspicion that this DNS server behavior might be somewhat contrary to DNS spec, because it says:
The additional records section contains RRs which relate to the query, but are not strictly answers for the question.
In my case - and in the other 2 cases above - the additional section contained RRs which were strictly answers for the question.
I don't think it could hurt to merge the additional section into the answer section before parsing it for the resolved address.
I have the distinct suspicion that this DNS server behavior might be somewhat contrary to DNS spec
Your guess sounds right to me.
I don't think it could hurt to merge the additional section into the answer section before parsing it for the resolved address.
As written we don't have the option - we're delegating all of DNS to the glibc that nsncd is built with. That's what ignoring the ADDITIONAL section and (I would guess) what's making multiple queries.
@flokli I think we have to figure out if the right nss modules are getting loaded or we're building nsncd in an odd way or something fun like that. An alternative is doing DNS resolution directly in nsncd with something like trust-dns but I think at that point I'd advocate for ripping out DNS support all together in favor of running a local stub resolver that handles all of this instead of using a more complex NSS config.
@flokli I think we have to figure out if the right nss modules are getting loaded or we're building nsncd in an odd way or something fun like that.
We're just running glibc codepaths, so it's using the same loading order, no?
An alternative is doing DNS resolution directly in nsncd with something like trust-dns but I think at that point I'd advocate for ripping out DNS support all together in favor of running a local stub resolver that handles all of this instead of using a more complex NSS config.
DNS is only one small part of host resolution. Your LDAP/WINS/avahi/… NSS module can also provide host lookups, Being able to have some NSS modules in the chain, without loading them directly inside the application is precisely why we use it in NixOS, as described in more detail in https://flokli.de/posts/2022-11-18-nsncd/.
I'd need to load some context again, but my preferred approach forward would be to look again at how we serialize responses back to the client, and make sure we behave the same way here as described in https://github.com/twosigma/nsncd/issues/90#issuecomment-1874473925.