sysbox DNS not working after docker daemon restart (live-restore)

We're using calico with docker (libnetwork-plugin) and every hw node has a local dnsmasq to proxy and cache DNS requests. So our docker configuration points the dns of every container to the host machine. So far everything works perfectly fine.

After restarting the docker daemon using the live-restore feature it seems that at least the iptables rules in the container nat table gets messed up, so dns requests stops working.

This is the table when a sysbox container is freshly started or restarted

# iptables -L -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
DOCKER_OUTPUT  all  --  anywhere             169.254.1.1
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
DOCKER_OUTPUT  all  --  anywhere             169.254.1.1
Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
DOCKER_POSTROUTING  all  --  anywhere             169.254.1.1
Chain DOCKER_OUTPUT (2 references)
target     prot opt source               destination
DNAT       tcp  --  anywhere             169.254.1.1          tcp dpt:domain to:127.0.0.11:46183
DNAT       udp  --  anywhere             169.254.1.1          udp dpt:domain to:127.0.0.11:51249
Chain DOCKER_POSTROUTING (1 references)
target     prot opt source               destination
SNAT       tcp  --  127.0.0.11           anywhere             tcp spt:46183 to:169.254.1.1:53
SNAT       udp  --  127.0.0.11           anywhere             udp spt:51249 to:169.254.1.1:53

After Daemon restart additional rules appear, which are the default when not using sysbox.

iptables -L -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
DOCKER_OUTPUT  all  --  anywhere             169.254.1.1
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
DOCKER_OUTPUT  all  --  anywhere             127.0.0.11
DOCKER_OUTPUT  all  --  anywhere             169.254.1.1
Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
DOCKER_POSTROUTING  all  --  anywhere             127.0.0.11
DOCKER_POSTROUTING  all  --  anywhere             169.254.1.1
Chain DOCKER_OUTPUT (3 references)
target     prot opt source               destination
DNAT       tcp  --  anywhere             127.0.0.11           tcp dpt:domain to:127.0.0.11:39639
DNAT       udp  --  anywhere             127.0.0.11           udp dpt:domain to:127.0.0.11:43355
DNAT       tcp  --  anywhere             169.254.1.1          tcp dpt:domain to:127.0.0.11:46183
DNAT       udp  --  anywhere             169.254.1.1          udp dpt:domain to:127.0.0.11:51249

Since sysbox rewrites resolv.conf to use 169.254.1.1 as dns and not 127.0.0.11 dns requests breaks.

If you need any more information I am happy to provide them. Thanks!

Apr 22 '21 16:04 andreasnanko

Thanks for reporting this one @andreasnanko!

Will copy/paste below some of the notes exchanged in our slack discussion ...

--

The idea of having Sysbox modifying the iptables is to allow the inner containers to also reach the DNS server that seats in the host (which is something that you need for Calico's case). Without Sysbox doing this re-adjustment, the inner-containers (those inside the sysbox container), would see a generic DNS resolver (8.8.8.8 or similar), so they wouldn't ever hit the host DNS.

My idea with this experiment (disabling dns-aliasing) was to verify that things work fine when this 'readjustment' isn't made. But even if it does work, that wouldn't fix the DNS-reachability issue for your inner containers.

Apr 22 '21 17:04 rodnymolina

Problem is reproduced in regular docker environments interconnected through Calico CNI. I suspect though, that Calico has not much to do with this issue and that problem is simply a consequence of Docker operating in 'live-restore' mode.

As mentioned above, Sysbox has specialized logic to alter the forwarding state of containers that rely on the host DNS-server for their operation. This is typically the case for containers that have been placed within custom docker-networks during initialization (through the docker --net cli attribute).

In Calico's setup, we also want all DNS traffic generated within the sysbox container to make use of the host DNS-server, so this logic is applicable too. Now, we need to identify a way that allow us to interact with Docker to prevent the forwarding state (previously injected by Sysbox) to be wiped out upon dockerd being restarted.

Apr 22 '21 17:04 rodnymolina

Hi @andreasnanko , thanks for reporting the issue.

We do a lot of testing for Sysbox, but Docker "live-restore" is one we had missed and therefore we did not catch this. Apologies for that.

I was able to reproduce the problem you reported. As you and Rodny noted, Sysbox does some iptable magic to modify the DNS resolve inside the Sysbox container, with the purpose of allowing inner Docker containers (launched inside the Sysbox container) to resolve DNS correctly too.

The problem you found is that when the Docker daemon at host level is configured with "live-restore", restarting that Docker daemon causes it to inject the original rules back into the iptables in the container's network namespace, thus colliding with the iptables rules injected by Sysbox and breaking DNS resolution.

This is going to be a tricky one for us to resolve, because there isn't a notification from Docker to Sysbox that the container's iptables have been modified when Docker restarts. We need to think if there is some way that Sysbox can detect this, though it's not trivial.

Rodny had suggested using disabling the behavior by which Sysbox replaces the iptable rules for the container's DNS. You can try this (by setting the alias-dns=false in the sysbox-mgr command line in the systemd service unit (/lib/systemd/system/sysbox-mgr.service), but the likely result is that containers created inside the outer container will have their DNS set to 8.8.8.8. Not sure if this would serve as a work-around for your case.

If I may ask, how important is this fix for your company's use-case? This will help us prioritize among the other issues we are working on. Thanks!

Apr 23 '21 06:04 ctalledo

@andreasnanko, just wanted to let you know that I ran a quick test to verify the proper operation of sysbox-mgr's alias-dns=false setting, and everything worked well for me, meaning that the knob was properly honored. I made a silly mistake last time I checked, sorry for any confusion this may have caused.

Please give it a try and let us know how it goes. As we said before, this may fix the forwarding issue after docker restart, but it may not be enough for your inner containers.

Now, let me ask you the following for my own understanding. What's the advantage that you are seeing in having docker's live-restore feature if you could simply rely on docker-swarm to manage your services? To me it looks like docker-swarm is a more matured and complete option. Please let me know if I'm missing something here as live-restore is a relatively new feature for me.

Apr 26 '21 22:04 rodnymolina

@rodnymolina seems to work. we use live-restore for docker upgrades or config changes without having to restart the containers if not needed.

May 12 '21 10:05 andreasnanko

Hi @andreasnanko, good to hear you were able to work-around it by setting alias-dns=false in the sysbox-mgr's command line.

Let's keep this issue open until we have a proper fix: ideally sysbox should detect the docker live-restore event and restore the dns aliasing on the containers.

Let us know if there is anything else we can help with in the meantime.

May 12 '21 16:05 ctalledo