Docker image fails to start with default settings if a custom address pool is configured

Open tobia opened this issue 5 years ago • 1 comments

Describe the bug If custom address pools are configured in Docker, the dkron image fails to start.

Custom address pools are common in server deployments: they are needed to avoid collisions between the networks created by Docker and other existing LAN / DMZ resources.

To Reproduce

Create a /etc/docker/daemon.json file with the following contents (example taken from Docker's documentation):

{
  "default-address-pools": [
    { "base": "172.80.0.0/16", "size": 24 },
    { "base": "172.90.0.0/16", "size": 24 }
  ]
}

Restart Docker:

$ sudo systemctl restart docker

Start a dkron container with default settings:

$ docker run --rm -p 8080:8080 dkron/dkron agent --server --bootstrap-expect=1 --log-level=debug
[...]
time="2020-11-23T18:48:28Z" level=info msg="agent: Dkron agent starting" node=3084562912d8
time="2020-11-23T18:48:28Z" level=info msg="2020/11/23 18:48:28 [INFO] serf: EventMemberJoin: 3084562912d8 ::"
time="2020-11-23T18:48:28Z" level=info msg="agent: joining: [] replay: true" node=3084562912d8
time="2020-11-23T18:48:28Z" level=fatal msg="listen tcp: lookup <nil>: no such host" node=3084562912d8

See the fatal error. Notice that instead of the private IP address, the 3rd to last line ends with an IPv6 wildcard ::

Specifications

OS: GNU/Linux
Docker version: 19.03.13, build 4484c46d9d
Dkron version: 3.0.8

Nov 23 '20 18:11 tobia

A workaround for single-server or test instances is the following:

--bind-addr=127.0.0.1:8946

This works because every container has at least the 127.0.0.1 address.

In production environments, the bind-addr should probably be configured anyways (using one of the host's addresses, binding dkron's port to it) so this is not a critical bug. But it can be confusing when trying out the Docker deployment of dkron for the first time.

By the way, I checked the network configuration and it is the same. This is what the container sees without an /etc/docker/daemon.json file:

$ docker run --rm alpine ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
357: eth0@if358: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP 
    link/ether 02:42:ac:50:00:02 brd ff:ff:ff:ff:ff:ff
    inet 172.80.0.2/24 brd 172.80.0.255 scope global eth0
       valid_lft forever preferred_lft forever

And this is what it sees after creating a /etc/docker/daemon.json as per the post above:

$ docker run --rm alpine ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
360: eth0@if361: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP 
    link/ether 02:42:ac:50:00:02 brd ff:ff:ff:ff:ff:ff
    inet 172.80.0.2/24 brd 172.80.0.255 scope global eth0
       valid_lft forever preferred_lft forever

As you can see, they are identical. The only difference is the @if358 / @if361 interface ID.

Same goes for routing, in both cases you get the following:

$ docker run --rm alpine ip route
default via 172.80.0.1 dev eth0 
172.80.0.0/24 dev eth0 scope link  src 172.80.0.2

What is dkron seeing in one case and not in the other?

Nov 24 '20 09:11 tobia