Docker image fails to start with default settings if a custom address pool is configured
Describe the bug If custom address pools are configured in Docker, the dkron image fails to start.
Custom address pools are common in server deployments: they are needed to avoid collisions between the networks created by Docker and other existing LAN / DMZ resources.
To Reproduce
- Create a
/etc/docker/daemon.jsonfile with the following contents (example taken from Docker's documentation):
{
"default-address-pools": [
{ "base": "172.80.0.0/16", "size": 24 },
{ "base": "172.90.0.0/16", "size": 24 }
]
}
- Restart Docker:
$ sudo systemctl restart docker
- Start a dkron container with default settings:
$ docker run --rm -p 8080:8080 dkron/dkron agent --server --bootstrap-expect=1 --log-level=debug
[...]
time="2020-11-23T18:48:28Z" level=info msg="agent: Dkron agent starting" node=3084562912d8
time="2020-11-23T18:48:28Z" level=info msg="2020/11/23 18:48:28 [INFO] serf: EventMemberJoin: 3084562912d8 ::"
time="2020-11-23T18:48:28Z" level=info msg="agent: joining: [] replay: true" node=3084562912d8
time="2020-11-23T18:48:28Z" level=fatal msg="listen tcp: lookup <nil>: no such host" node=3084562912d8
- See the fatal error. Notice that instead of the private IP address, the 3rd to last line ends with an IPv6 wildcard
::
Specifications
- OS: GNU/Linux
- Docker version: 19.03.13, build 4484c46d9d
- Dkron version: 3.0.8
A workaround for single-server or test instances is the following:
--bind-addr=127.0.0.1:8946
This works because every container has at least the 127.0.0.1 address.
In production environments, the bind-addr should probably be configured anyways (using one of the host's addresses, binding dkron's port to it) so this is not a critical bug. But it can be confusing when trying out the Docker deployment of dkron for the first time.
By the way, I checked the network configuration and it is the same. This is what the container sees without an /etc/docker/daemon.json file:
$ docker run --rm alpine ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
357: eth0@if358: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
link/ether 02:42:ac:50:00:02 brd ff:ff:ff:ff:ff:ff
inet 172.80.0.2/24 brd 172.80.0.255 scope global eth0
valid_lft forever preferred_lft forever
And this is what it sees after creating a /etc/docker/daemon.json as per the post above:
$ docker run --rm alpine ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
360: eth0@if361: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
link/ether 02:42:ac:50:00:02 brd ff:ff:ff:ff:ff:ff
inet 172.80.0.2/24 brd 172.80.0.255 scope global eth0
valid_lft forever preferred_lft forever
As you can see, they are identical. The only difference is the @if358 / @if361 interface ID.
Same goes for routing, in both cases you get the following:
$ docker run --rm alpine ip route
default via 172.80.0.1 dev eth0
172.80.0.0/24 dev eth0 scope link src 172.80.0.2
What is dkron seeing in one case and not in the other?