aardvark-dns icon indicating copy to clipboard operation
aardvark-dns copied to clipboard

Need bidirectional communication channel between netavark and aardvark

Open Luap99 opened this issue 1 year ago • 8 comments

Right now our dns startup is super flaky causing many flakes in CI that are only solved by using retries. This is bad and often not what users are doing. Sending signals is just not reliable. Netavark sends the signal on a update but then never wait for aardvark-dns to actually update the names and be ready to respond to the new name. The same goes for error handling aardvark-dns logs its errors to journald but there is absolutely no way right now to get this error back to netavark and thus podman. A common problem is that port 53 is already bound causing aardvark-dns to be up and running but unable to serve any dns.

There are a lot of dns related issues on the podman issue tracker, most not really possible to debug. IMO we have to address this situation.

Of course one important caveat is that we must stay backwards compatible. I am creating to have a discussion about it so we can find a good solution for this.

cc @baude @mheon @flouthoc

Luap99 avatar Jun 08 '23 12:06 Luap99

Are you thinking something like a unix socket, where we could pass requests from NV to AV and receive a response when the change was fully implemented?

mheon avatar Jun 08 '23 12:06 mheon

Yes, I just want something were can make sure netavark won't return until aardvark-dns is ready and if there was an error we should get it back.

Luap99 avatar Jun 08 '23 13:06 Luap99

I would prefer not to drag a full REST API in, so I wonder if we can't do something a little lighter (protobuf, maybe? Does that have good rust bindings?)

The idea in general seems sound, and could serve to enable additional features in the future (we've talked about having Aardvark listen for DBus events and launch Netavark when the firewall reloads, and bidirectional comms could be useful for that)

mheon avatar Jun 08 '23 13:06 mheon

I don't care about the protocol, protbuf would work we use it already for the dhcp proxy so it is not a new dependency for netavark. But honestly I think it right now a simple string based API would be enough assuming we keep the current way of writing entries to file.

Luap99 avatar Jun 08 '23 13:06 Luap99

xref: https://github.com/containers/podman/issues/18325

edsantiago avatar Jun 08 '23 15:06 edsantiago

xref: https://github.com/containers/podman/issues/16272

edsantiago avatar Jun 08 '23 15:06 edsantiago

I didn't even bother linking issues, I could properly link 20+ issues from the podman repo that may not be fixed by this but at least can be diagnosed by the users.

Common error is having something listening on port 53.

$ sudo nc -u -l 53
$ sudo podman run --network podman1 --rm alpine nslookup google.com
nslookup: write to '10.89.0.1': Connection refused
;; connection timed out; no servers could be reached

That is what the user sees, dns not working but they don't know why.

The only real clue is in the journal but most people will never check that:

aardvark-dns[34502]: Unable to start server unable to start CoreDns server: Address already in use (os error 98)

The goal here would be to have the podman run command error out with the aardvark error.

Luap99 avatar Jun 08 '23 16:06 Luap99

@Luap99 do you want to self-assign this or prefer to wait until your workload lessens?

baude avatar Jun 29 '23 14:06 baude