moby icon indicating copy to clipboard operation
moby copied to clipboard

Conflict between default `-H fd://` systemd config and daemon.json `hosts` config

Open sdurrheimer opened this issue 8 years ago • 33 comments

Using hosts configuration in daemon.json file is conflicting with the default systemd -H fd:// config.

Of course, I can edit the /lib/systemd/system/docker.service file or duplicate it as /etc/systemd/system/docker.service or even add a complementary config file in /etc/systemd/system/docker.service.d/ to change the ExecStart command. But then it would imply to maintain this change instead of simply use the daemon.json file to customize daemon options.

Output of docker version:

Client:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 22:00:43 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 22:00:43 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.11.2
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 4.4.0-31-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 992.5 MiB
Name: lsphproxy1
ID: GF3H:AK3M:7MPZ:WNK5:47KQ:VLTL:EBNM:HMBW:PR7K:NQ47:ODXF:3OG6
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Cluster store: etcd://127.0.0.1:2379
Cluster advertise: 172.25.0.2:2376

daemon.json:

{
    "cluster-advertise": "enp0s8:2376", 
    "cluster-store": "etcd://127.0.0.1:2379", 
    "hosts": [
        "tcp://127.0.0.1:2375", 
        "tcp://172.25.0.2:2375"
    ], 
    "storage-driver": "overlay"
}

Steps to reproduce the issue:

  1. Use the default systemd docker.service from apt package
  2. Add hosts configuration in /etc/docker/daemon.json
  3. sudo systemctl restart docker

Describe the results you received:

Docker failed to start.

docker[14447]: unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives are specified both as a flag and in the configuration file: hosts: (from flag: [fd://], from file: [tcp://127.0.0.1:2375 tcp://172.25.0.2:2375])

Describe the results you expected:

Docker started.

sdurrheimer avatar Aug 06 '16 22:08 sdurrheimer

Also it is true that in the case of systemd, adding ListenStream directives to the docker.socket systemd unit file is a better approach than adding tcp:// hosts configs in the daemon.json file.

sdurrheimer avatar Aug 07 '16 11:08 sdurrheimer

Yes I would agree, best to use one place or the other.

The decision was made that the config would conflict with the file, so this is intended behaviour, if slightly non obvious...

On 7 Aug 2016 1:06 p.m., "Steve Durrheimer" [email protected] wrote:

Also it is true that in the case of systemd, adding ListenStream directives to the docker.socket systemd unit file is a better approach than adding tcp:// hosts configs in the daemon.json file.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/docker/docker/issues/25471#issuecomment-238076313, or mute the thread https://github.com/notifications/unsubscribe-auth/AAdcPDJnTQ8jcjiZu3FxwqJIdOH3aJlHks5qdbwRgaJpZM4JeYIv .

justincormack avatar Aug 07 '16 11:08 justincormack

I wonder if we should remove the socket activation altogether in the default setup; we removed it for RPM setups (where it caused issues in combination with live restore). That way, the unit file could remove the -H flag and let it use the default (listen on /var/run/docker.sock)

thaJeztah avatar Aug 07 '16 14:08 thaJeztah

That would seem consistent, socket activation doesn't seem sensible really for most normal use cases. I suppose the logic was it means other services could rely on docker being up via socket activation.

Maybe there is a case for making listen address an exception that can be specified in both files?

On 7 Aug 2016 4:04 p.m., "Sebastiaan van Stijn" [email protected] wrote:

I wonder if we should remove the socket activation altogether in the default setup; we removed it for RPM setups (where it caused issues in combination with live restore). That way, the unit file could remove the -H flag and let it use the default (listen on /var/run/docker.sock)

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/docker/docker/issues/25471#issuecomment-238084203, or mute the thread https://github.com/notifications/unsubscribe-auth/AAdcPIntAAVuH7xsX4myHpkwWLdHdoc6ks5qdeXNgaJpZM4JeYIv .

justincormack avatar Aug 07 '16 14:08 justincormack

Maybe there is a case for making listen address an exception that can be specified in both files?

I thought about that, we should keep in mind that multiple addresses can be specified (e.g. both a socket, and an IP address); would specifying it in docker.json override or append those options?

Reason for erring out originally was to prevent weird behavior ("I specified a flag, but it's not working"), also it may be odd to ignore a flag (flags should always take precedence IMO)

thaJeztah avatar Aug 07 '16 14:08 thaJeztah

It would have to append, vs the current conflict. Not sure it is a good idea. Alternatively, if there are good cases for socket activation, perhaps only fd:// could become a different option? Just a bit worried we are completely throwing out the benefits of socket activation for dependency ordering completely...

On 7 Aug 2016 4:45 p.m., "Sebastiaan van Stijn" [email protected] wrote:

Maybe there is a case for making listen address an exception that can be specified in both files?

I thought about that, we should keep in mind that multiple addresses can be specified (e.g. both a socket, and an IP address); would specifying it in docker.json override or append those options?

Reason for erring out originally was to prevent weird behavior ("I specified a flag, but it's not working"), also it may be odd to ignore a flag (flags should always take precedence IMO)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/docker/docker/issues/25471#issuecomment-238086478, or mute the thread https://github.com/notifications/unsubscribe-auth/AAdcPKoH89R4cy8tmMuJv6CksL-KU8GFks5qde99gaJpZM4JeYIv .

justincormack avatar Aug 07 '16 14:08 justincormack

So while potentially somewhat off topic the removal of fd:// on RHEL/RPM was somewhat disastrous.

The default docks you get from searching still show using fd:// in the docker.service file, and anyone on RHEL will already be trying to implement those changes since by default you get a docker/docker-network/etc. in /etc/sysconfig, so for I would say a significant portion of RHEL users, the 1.12 installation broke for them since many still had "fd://" in their docker.service files and that didn't get replaced with package upgrade to 1.12.

As to socket activation being non-useful for most applications: I'm curious what everyone believes the default connection methodology is? TCP?

By default the unix:/// socket requires users to implement DOCKER_HOST AND be put into the docker group (security hole) or be given sudo permissions...not as much of a security hole.

FD was a good generic solution to local access in some cases. It was the "it just works" solution. With the removal RHEL admins were having to scramble to get their docker installs back up and running. I would be concerned what would happen if we broke Ubuntu installs the same way.

Illydth avatar Aug 08 '16 08:08 Illydth

By default the unix:/// socket requires users to implement DOCKER_HOST

Afaik, the default does not require a DOCKER_HOST to be specified; if nothing is set, the docker client will try to connect to /var/run/docker.sock

I agree that migrating to remove fd:// can be painful if people modified their unit file (which they shouldn't), and it doesn't upgrade because of that, or have a systemd drop-in file in place that contains the fd:// (not sure what the best solution for that would be, other than documentation).

thaJeztah avatar Aug 08 '16 10:08 thaJeztah

I agree that migrating to remove fd:// can be painful if people modified their unit file (which they shouldn't)

@thaJeztah But, isn't that the only way that someone could install the engine and also bind it to a port? Either I have to add -H 0.0.0.0:2376 to the unit file, or I need to remove -H fd:// from the unit file and create a daemon.json.

FWIW I ran into exactly this on a couple of 'pet' hosts when I was updating from v1.11 -> v1.12.

mikedougherty avatar Aug 26 '16 20:08 mikedougherty

The "right" way to modify unit files is via systemd drop-ins: https://docs.docker.com/engine/admin/systemd/#/custom-docker-daemon-options

cpuguy83 avatar Aug 26 '16 20:08 cpuguy83

isn't it a sensible default to fallback on fd:// if not specified so it can be omitted in the standard /lib/systemd/system/docker.service ExecStart ?

currently you have to edit the docker.service in order to specify/add the remote-api host and it makes more sense to use daemon.json for that

tubbynl avatar Sep 15 '16 11:09 tubbynl

@tubbynl No, it should not fallback to fd://. fd:// is for socket activation. We removed the socket activation file in 1.12. fd:// w/o a socket file is the very thing that is causing the issue reported here. Manually editing docker.service is why you still have fd://

Docker does fallback to unix sockets when no host is specified.

cpuguy83 avatar Sep 15 '16 12:09 cpuguy83

that's a curious thing; i ran into this using a clean install (yesterday) from the docker debian-jessie repository (Docker version 1.12.1, build 23cf638) and the -H fd:// parameter was specified in my /lib/systemd/system/docker.service file

is that not supposed to be there then?

tubbynl avatar Sep 15 '16 12:09 tubbynl

Ok, I give up... I don't know what's up. Socket file should be gone but it's not, -H fd:// also gone but it's not. Did we revert in 1.12.1?

cpuguy83 avatar Sep 15 '16 12:09 cpuguy83

@cpuguy83

IMPORTANT: Docker 1.12 ships with an updated systemd unit file for rpm
based installs (which includes RHEL, Fedora, CentOS, and Oracle Linux 7).

For rpm based systems yes, not for apt.

sdurrheimer avatar Sep 15 '16 12:09 sdurrheimer

just verified it with a clean virtualbox instance using debian-8.5.0-amd64-netinst

Linux local-dev 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux Docker version 1.12.1, build 23cf638

cat /lib/systemd/system/docker.service | grep ExecStart

yields

ExecStart=/usr/bin/dockerd -H fd://

tubbynl avatar Sep 15 '16 12:09 tubbynl

Correct, socket activation was only removed for RPM based installs, because on those installs, socket activation caused problems with live-restore (see this directory; there's a separate unit file for RPM based installs: https://github.com/docker/docker/tree/v1.12.1/contrib/init/systemd). Socket activation is still in place for .deb installs. Wondering if we should remove it altogether.

thaJeztah avatar Sep 15 '16 13:09 thaJeztah

Sorry if i post in wrong place I have Ubuntu 16.04 LTS and docker 1.12.1 When i try to run docker daemon, I still have problem as described above.

>> service docker status
● docker.service
   Loaded: loaded (/etc/systemd/system/docker.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since чт 2016-10-20 00:39:06 EEST; 1min 33s ago
  Process: 28828 ExecStart=/usr/bin/dockerd $OPTIONS $DOCKER_STORAGE_OPTIONS $DOCKER_NETWORK_OPTIONS $BLOCK_REGISTRY $INSECURE_REGISTRY (code=exited, status=1/FAILURE)
 Main PID: 28828 (code=exited, status=1/FAILURE)

жов 20 00:39:06 moonbrv-580 systemd[1]: Started docker.service.
жов 20 00:39:06 moonbrv-580 dockerd[28828]: time="2016-10-20T00:39:06+03:00" level=fatal msg="unable to configure the Docker daemon with file /etc/docker/daemon.json: EOF\n"
жов 20 00:39:06 moonbrv-580 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
жов 20 00:39:06 moonbrv-580 systemd[1]: docker.service: Unit entered failed state.
жов 20 00:39:06 moonbrv-580 systemd[1]: docker.service: Failed with result 'exit-code'.

In my docker.service i have -H fd:// option

>> grep 'ExecStart' /lib/systemd/system/docker.service
ExecStart=/usr/bin/dockerd -H fd:// $DOCKER_OPTS

I did remove -H fd:// from this file but have the same problem. I'm really sorry for that question, Ubuntu is new thing for me, but please explain me what i need to do to fix that problem. You mention about drop-in file but I don't understand what i need to write. My /etc/systemd/system/docker.service have next content:

[Service]
EnvironmentFile=-/etc/sysconfig/docker
EnvironmentFile=-/etc/sysconfig/docker-storage
EnvironmentFile=-/etc/sysconfig/docker-network
ExecStart=
ExecStart=/usr/bin/dockerd $OPTIONS \
          $DOCKER_STORAGE_OPTIONS \
          $DOCKER_NETWORK_OPTIONS \
          $BLOCK_REGISTRY \
          $INSECURE_REGISTRY

Text the same as in docs... I just don't understand what wrong.

moonbrv avatar Oct 19 '16 21:10 moonbrv

@moonbrv try the following

  • overide the docker.service - add /etc/systemd/system/docker.service.d/override.conf:

    [Service]
    ExecStart=
    ExecStart=/usr/bin/dockerd
    

    Using ExecStart is not an issue. It has to be "cleaned" first, before setting a value.

  • reload the systemd daemon:

    systemctl daemon-reload
    
  • restart docker:

    systemctl restart docker.service
    

PS:

Below the /etc/systemd/system/multi-user.target.wants/docker.service from my machine:

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network.target docker.socket
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd://
ExecReload=/bin/kill -s HUP $MAINPID
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process

[Install]
WantedBy=multi-user.target

And the /etc/docker/daemon.json:

{
  "tlsverify": true,
  "tlscacert": "/etc/docker/ca.pem",
  "tlscert"  : "/etc/docker/cert.pem",
  "tlskey"   : "/etc/docker/key.pem",
  "hosts"    : ["fd://", "tcp://0.0.0.0:2376"],
  "dns"      : ["8.8.8.8","8.8.4.4"],
  "ipv6"     : false
}

czerasz avatar Nov 27 '16 04:11 czerasz

I just hit this on a new install on Ubuntu 16.04 after running Docker for years on Ubuntu 16.04. I think that expecting people to know this much detail about systemd is a big problem because it took me quite a while to even find the relevant error message in journalctl -xe.

snth avatar Sep 12 '17 08:09 snth

Yep, bug has been open for more than one year and there is not even an hint in the documentation.

dvenza avatar Oct 18 '17 08:10 dvenza

@dmitriyse: don't patch the systemd files directly but use systemd drop-in (as suggested by @czerasz 3 comments up):

# As root
mkdir -p /etc/systemd/system/docker.service.d
echo '[Service]
ExecStart=
ExecStart=/usr/bin/dockerd' > /etc/systemd/system/docker.service.d/simple_dockerd.conf
systemctl daemon-reload
service docker restart

This way you only override the ExecStart and it doesn't get modified on every apt upgrade.

zigarn avatar Nov 04 '17 17:11 zigarn

Thanks , this tip works great.

dmitriyse avatar Nov 04 '17 21:11 dmitriyse

Please add this tip somewhere in the documentation. It's too difficult to find the right answer.

dmitriyse avatar Nov 04 '17 21:11 dmitriyse

Not sure why the "workarounds" are considered the known and good practice. It's good we have workarounds but shouldn't this just work? Docker should update their codebase to be able to handle this; docs aside.

samrocketman avatar Nov 05 '17 00:11 samrocketman

@snth use journalctl -u docker to find relevant message =)

yuklia avatar Feb 11 '19 20:02 yuklia

the issue with TLS reproduced again

if i remove TLS from /etc/docker/daemon.json it works fine

docker engine:18.09.1 docker engine:18.06.0 linux: bionic https://github.com/moby/moby/issues/22339#issuecomment-462408990

yuklia avatar Feb 11 '19 20:02 yuklia

It seems the issue with hosts directive returned in 18.09:

 unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives are specified both as a flag and in the configuration file: hosts: (from flag: [fd://], from file: [tcp://0.0.0.0 unix:///var/run/docker.sock])

first -H unix:// was added in 18.09.0 (here) then it was changed to -H fd:// in 18.09.1 (here)

kostrzewa9ld avatar Apr 03 '19 11:04 kostrzewa9ld

The issue was there in all releases, but socket-activation wasn't available on all distros for some releases (and only set in .deb packages, not in .rpm); 18.09 added back support for socket-activation for all.

thaJeztah avatar Apr 03 '19 14:04 thaJeztah

@thaJeztah Well, it wasn't present in 18.03.1 RPMs for CentOS which I am currently using and when I tried to upgrade I got hit by this. So the recommended solution for this is to patch/override docker's service file after installation? would it be possible to package docker with socket-activation enabled viadaemon.json instead?

kostrzewa9ld avatar Apr 04 '19 04:04 kostrzewa9ld