docker-debian11-ansible icon indicating copy to clipboard operation
docker-debian11-ansible copied to clipboard

using this on debian 11 aka. bullseye is resulting in a non-systemd

Open zerwes opened this issue 3 years ago • 49 comments

System

Debian 11 aka. bullseye with the debian docker.io packages. (more details later)

Description

While trying to use the image directly in docker or via molecule, the image starts, but it seems it is not systemd enabled, resulting in failed test runs.

A self-brewn docker immage based on the official debian:bullseye works instead as expected. But to be honest, docker is really not my area of expertise...

The issue seems to occur not only on the debian11 image, others like geerlingguy/docker-centos8-ansible, geerlingguy/docker-ubuntu2004-ansibleetc. seem affected too.

Steps to reproduce

$ docker run --detach --privileged --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro geerlingguy/docker-debian11-ansible:latest
0c103204a41a3dd1487ab70813ac5fd4480f3f9e904f70cbe0c8a2b02443d986
$ docker ps
CONTAINER ID   IMAGE                                        COMMAND                  CREATED          STATUS          PORTS     NAMES
0c103204a41a   geerlingguy/docker-debian11-ansible:latest   "/lib/systemd/systemd"   39 seconds ago   Up 38 seconds             jovial_wilson
$ docker exec --tty 0c103204a41a /bin/systemctl status
Failed to connect to bus: No such file or directory

Test with own dilettantic build

$ cat Dockerfile 

FROM debian:bullseye

ENV container docker
ENV LC_ALL C
ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update \
    && apt-get install -y python3 sudo bash ca-certificates iproute2 python3-apt aptitude systemd systemd-sysv \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

RUN rm -f /lib/systemd/system/multi-user.target.wants/* \
    /etc/systemd/system/*.wants/* \
    /lib/systemd/system/local-fs.target.wants/* \
    /lib/systemd/system/sockets.target.wants/*udev* \
    /lib/systemd/system/sockets.target.wants/*initctl* \
    /lib/systemd/system/sysinit.target.wants/systemd-tmpfiles-setup* \
    /lib/systemd/system/systemd-update-utmp*

RUN systemctl set-default multi-user.target

#VOLUME [ "/sys/fs/cgroup" ]

CMD [ "/lib/systemd/systemd", "log-level=info", "unit=sysinit.target" ]


$ docker build .
Sending build context to Docker daemon  3.072kB
Step 1/8 : FROM debian:bullseye
...
Successfully built eb8ff56c63ab

$ docker tag eb8ff56c63ab test-deb11-systemd

$ docker  run --detach --privileged  --name test-deb11-systemd test-deb11-systemd
7b0afaa24585c10a5ddcab18c0b1d06aef23501282dc0e8918e505784862a2a8

$ docker exec --tty 7b0afaa24585c10a5ddcab18c0b1d06aef23501282dc0e8918e505784862a2a8 /bin/systemctl status
* 7b0afaa24585
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Fri 2022-01-21 21:50:32 UTC; 6s ago
   CGroup: /
           |-init.scope 
           | |- 1 /lib/systemd/systemd log-level=info unit=sysinit.target
           | |-35 /bin/systemctl status
           | `-42 (pager)
           `-system.slice 
             `-systemd-journald.service 
               `-26 /lib/systemd/systemd-journald

Distro and Packages:

Distributor ID:	Debian
Description:	Debian GNU/Linux 11 (bullseye)
Release:	11
Codename:	bullseye


Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                      Version                 Architecture Description
+++-=========================-=======================-============-=====================================================
ii  docker                    1.5-2                   all          transitional package
ii  docker.io                 20.10.5+dfsg1-1+deb11u1 amd64        Linux container runtime
ii  python3-docker            4.1.0-1.2               all          Python 3 wrapper to access docker.io's control socket

check-config

$ /usr/share/docker.io/contrib/check-config.sh
warning: /proc/config.gz does not exist, searching other paths for kernel config ...
info: reading kernel config from /boot/config-5.10.0-10-amd64 ...

Generally Necessary:
- cgroup hierarchy: cgroupv2
- apparmor: enabled, but apparmor_parser missing
    (use "apt-get install apparmor" to fix this)
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_NETFILTER_XT_MARK: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_POSIX_MQUEUE: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_MEMCG_SWAP: enabled
    (cgroup swap accounting is currently enabled)
- CONFIG_LEGACY_VSYSCALL_NONE: enabled
    (containers using eglibc <= 2.13 will not work. Switch to
     "CONFIG_VSYSCALL_[NATIVE|EMULATE]" or use "vsyscall=[native|emulate]"
     on kernel command line. Note that this will disable ASLR for the,
     VDSO which may assist in exploiting security vulnerabilities.)
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: missing
- CONFIG_IP_NF_TARGET_REDIRECT: enabled (as module)
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT4_FS: enabled (as module)
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
    - CONFIG_BRIDGE_VLAN_FILTERING: enabled
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled (as module)
      - CONFIG_CRYPTO_GCM: enabled (as module)
      - CONFIG_CRYPTO_SEQIV: enabled (as module)
      - CONFIG_CRYPTO_GHASH: enabled (as module)
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled (as module)
      - CONFIG_INET_ESP: enabled (as module)
  - "ipvlan":
    - CONFIG_IPVLAN: enabled (as module)
  - "macvlan":
    - CONFIG_MACVLAN: enabled (as module)
    - CONFIG_DUMMY: enabled (as module)
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_FTP: enabled (as module)
    - CONFIG_NF_NAT_TFTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_TFTP: enabled (as module)
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: missing
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled (as module)
    - CONFIG_BTRFS_FS_POSIX_ACL: enabled
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled (as module)
    - CONFIG_DM_THIN_PROVISIONING: enabled (as module)
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

zerwes avatar Jan 21 '22 22:01 zerwes

Can you confirm your molecule config looks something like the following? https://github.com/geerlingguy/ansible-role-apache/blob/master/molecule/default/molecule.yml#L7-L12

geerlingguy avatar Jan 21 '22 23:01 geerlingguy

Yes. Here the relevant part from a failing example:

  - name: keepalived-bionic
    pre_build_image: yes
    image: geerlingguy/docker-ubuntu1804-ansible:latest
    privileged: true
    command: /lib/systemd/systemd
    volumes:
    - /sys/fs/cgroup:/sys/fs/cgroup:ro

and a tasks that enables a service via systemd fails with:

"stderr_lines": ["Failed to connect to bus: No such file or directory"]

zerwes avatar Jan 22 '22 11:01 zerwes

@zerwes - Can you try changing the command to match what I have set up in mine?

geerlingguy avatar Jan 22 '22 19:01 geerlingguy

Hello @geerlingguy Unfortunately makes no difference:

@@ -71,6 +71,6 @@ platforms:
     pre_build_image: yes
     image: geerlingguy/docker-ubuntu2004-ansible:latest
     privileged: true
-    command: /lib/systemd/systemd
+    command: ${MOLECULE_DOCKER_COMMAND:-""}
     volumes:
     - /sys/fs/cgroup:/sys/fs/cgroup:ro

but on the invocation of systemctl: "rc": 1, "stderr": "Failed to connect to bus: No such file or directory"

zerwes avatar Jan 22 '22 19:01 zerwes

/me watches this :)

evrardjp avatar Jan 24 '22 09:01 evrardjp

I run into the same problem as @zerwes

stefanDeveloper avatar Mar 08 '22 18:03 stefanDeveloper

@stefanDeveloper is something like the docker file mentioned in the description of the issue or like https://github.com/Rosa-Luxemburgstiftung-Berlin/ansible-role-unbound/blob/main/molecule/default/Dockerfile-debian-bullseye.j2 working for you?

zerwes avatar Mar 09 '22 09:03 zerwes

@zerwes you saved my week, thanks that works like a charm!

stefanDeveloper avatar Mar 09 '22 10:03 stefanDeveloper

@stefanDeveloper glad to hear it helped. and maybe it helps @geerlingguy better to drill down the problem ...

zerwes avatar Mar 09 '22 12:03 zerwes

I got a similar problem like @zerwes (Hi, by the way :-) ) in https://github.com/NETWAYS/ansible-role-elasticsearch/pull/53 .

As another change that might have an influence I had to remove the following lines because it made starting Elasticsearch in the containers impossible on CentOS:

     volumes:
     - /sys/fs/cgroup:/sys/fs/cgroup:ro

Since I removed that, CentOS tests succeed but Debian ones fail. I put some debugging code into my roles to put out what's wrong. What I'm seeing is:

  fatal: [elasticsearch-cluster2]: FAILED! => {"changed": false, "cmd": "/bin/systemctl", "msg": "Failed to connect to bus: No such file or directory", "rc": 1, "stderr": "Failed to connect to bus: No such file or directory\n", "stderr_lines": ["Failed to connect to bus: No such file or directory"], "stdout": "", "stdout_lines": []}

I suspect, both containers are built differently and what fixes problems for one breaks it for the other?

widhalmt avatar Mar 24 '22 12:03 widhalmt

Hello @widhalmt, is something like the docker file mentioned in the description of the issue or like https://github.com/Rosa-Luxemburgstiftung-Berlin/ansible-role-unbound/blob/main/molecule/default/Dockerfile-debian-bullseye.j2 working for you?

zerwes avatar Mar 24 '22 12:03 zerwes

@zerwes So you mean, disabling mounting cgroups? As far as I understood the information from https://discuss.elastic.co/t/error-when-running-7-12-1-on-centos-7-in-docker/271508 the problem was cgroups being mounted in two parts of the test. Looks like you disabled it in the Docker file, I disabled it in molecule.yml. My approach did work with CentOS but not with Debian.

I use several containers by @geerlingguy so I can't easily exchange the container I'm using. I could give it a try to exchange it temporarily, though.

widhalmt avatar Mar 24 '22 13:03 widhalmt

I'm seeing the same effect with Rocky Linux 8 now, too. After removing the mount for cgroups in molecule.yml CentOS 7 works again but Debian 10, Debian 11 and Rocky Linux 8 fail.

widhalmt avatar Mar 24 '22 15:03 widhalmt

I use several containers by @geerlingguy so I can't easily exchange the container I'm using. I could give it a try to exchange it temporarily, though.

My intention is surely not to replace the widely< used docker images (therefore my docker foo is much to weak, as I consider myself in this topic just a average user), I just wanted to give @geerlingguy a hint and some help what works and what not ...

zerwes avatar Mar 25 '22 04:03 zerwes

What's weird is I'm using the same containers on a ton of my projects and not (seemingly) running into the same issues that are mentioned here.

(Edit: Though I'm running them either from mac OS, or from ubuntu...)

geerlingguy avatar Mar 25 '22 15:03 geerlingguy

I use several containers by @geerlingguy so I can't easily exchange the container I'm using. I could give it a try to exchange it temporarily, though.

My intention is surely not to replace the widely< used docker images (therefore my docker foo is much to weak, as I consider myself in this topic just a average user), I just wanted to give @geerlingguy a hint and some help what works and what not ...

Sorry, that was just me being unclear in my reply. I understood that you did only suppose that for tests and not to replace them completely. What I forgot to mention is, that I'm using them in a matrix check with different OS'es and I can't easily replace a single one, because it wouldn't even start. I need time to change the whole CI configuration to use the container in a test.

widhalmt avatar Mar 25 '22 15:03 widhalmt

@geerlingguy I really don't get it either. I see the problems mostly when running them and start Elasticsearch in GitHub actions. For now it works flawlessly with CentOS 7 (when I remove mounting the cgroups in molecule.yml. But it breaks in Rocky Linux 8, Debian 10 and Debian 11.

widhalmt avatar Mar 25 '22 15:03 widhalmt

I get a very similar error with failure 1 during daemon-reload: Failed to get D-Bus connection: No such file or directory but only running molecule tests locally on mac OS. In GitHub actions the same configuration works with CentOS 7 and Rocky Linux 8. I first thought this had something to do with the docker implementation on mac OS (docker desktop vs. native docker runtime). But I'm not that sure anymore.

tbumke avatar Mar 25 '22 16:03 tbumke

@tbumke - On macOS, that has to do with the implementation of cgroups v2 in Docker for Mac. I believe there's a way to work around it...

geerlingguy avatar Mar 25 '22 18:03 geerlingguy

@widhalmt @zerwes apologies if I have overlooked this but which host system are you using? I ran into the same issues and decided to give up on this matter, just watching this issue.

I am trying to run this in a WSL2 on either Windows 10 or 11 resulting in Debian based containers not starting with systemd or not starting at all. Concerning this all that I have found online is that for some reason WSL2 seems to be incompatible to handle this virtualization.

If it's a Windows-Virtualization issue it would explain why it works fine on (most) MACs and Jeff's Ubuntu

Paul-Weisser avatar Mar 25 '22 21:03 Paul-Weisser

@Paul-Weisser my first touch with this was running a debian 11 container on debian 11 ...

zerwes avatar Mar 25 '22 22:03 zerwes

@tbumke - On macOS, that has to do with the implementation of cgroups v2 in Docker for Mac. I believe there's a way to work around it...

Thanks @geerlingguy , this pointed me in the right direction. Searching for cgroups v2 and Docker for Mac, I found this issue https://github.com/docker/for-mac/issues/6073 which also describes a workaround.

Configuring "deprecatedCgroupv1": true (note the missing "s") in ~/Library/Group\ Containers/group.com.docker/settings.json tells Docker for Mac to use legacy cgroups v1. This of course is only a temporary fix until Ansible Molecule supports the cgroupns Docker parameter.

Running the container as follows and with cgroups v2 now also works in my setup:

docker run -it --privileged --cgroupns=host -v /sys/fs/cgroup:/sys/fs/cgroup:rw \
  --name instance -d geerlingguy/docker-debian11-ansible

Note also, that the sysfs volume permissions need to be changed to rw as well. Then I can successfully run systemd services and commands from the container.

tbumke avatar Mar 26 '22 11:03 tbumke

Thanks @tbumke !!

Changing

    volumes:
    - /sys/fs/cgroup:/sys/fs/cgroup:ro

to

    volumes:
    - /sys/fs/cgroup:/sys/fs/cgroup:rw

did the trick!

Now' I only have to find a way to get around a bug Elasticsearch ( https://github.com/elastic/elasticsearch/issues/74158 ) that keeps instances on multiple instances from starting because the Java Option parser print to stdout insttead of a file. But that only hits when I fire up several containers in a single test and won't keep me from proceeding with the other roles. Thank you everyone, that kept me in a constant state of rage for weeks now. :-)

widhalmt avatar Apr 01 '22 15:04 widhalmt

Ok, guess now I'm completely lost. Now it works sometimes and sometimes it doesn't. I'll have to take a deeper look, sorry.

widhalmt avatar Apr 01 '22 16:04 widhalmt

+1 have the same running from debian11. I believe since this image mounts cgroups into the image as a volume, it will have different results if you have different versions of cgroups in your host system. Should it work only on cgroupsv1?

staticdev avatar Apr 09 '22 16:04 staticdev

also have this, anything I can provide of information to get this fixed @geerlingguy?

barrelful avatar Apr 10 '22 18:04 barrelful

As I've said before, I haven't had any issues running this with systemd (for example, see my Docker role: https://github.com/geerlingguy/ansible-role-docker/blob/master/.github/workflows/ci.yml#L48 / https://github.com/geerlingguy/ansible-role-docker/runs/5959693637?check_suite_focus=true)

If someone can get a reproducible fault that works with the base image and the same kind of setup I'm using, that would be helpful.

(Another note: it seems cgroups v2 might be the main culprit for some people...)

geerlingguy avatar Apr 10 '22 19:04 geerlingguy

When using the CI/CD environment (gitlab ci or any), you can use the settings for the docker daemon:

daemon.json
{
  "debug": false,
  "default-cgroupns-mode": "host",
  "storage-driver": "vfs"
}

echohes avatar Apr 14 '22 14:04 echohes

Thanks, @echohes . For Elasticsearch it didn't work. There's a bug that interferes with Cgroups, maybe I just have to wait for a fix. Thanks anyway. Hopefully it works for others.

widhalmt avatar Apr 14 '22 14:04 widhalmt

@geerlingguy for using Docker on Mac it is in fact Docker Desktop I presume, right?

Looks like from the version 4.3.0 / 2021-12-02 release notes we have cgroups v2:

Docker Desktop now uses cgroupv2. If you need to run systemd in a container then:

  • Ensure your version of systemd supports cgroupv2. It must be at least systemd 247. Consider upgrading any centos:7 images to centos:8.
  • Containers running systemd need the following options: [--privileged --cgroupns=host -v /sys/fs/cgroup:/sys/fs/cgroup:rw] (https://serverfault.com/questions/1053187/systemd-fails-to-run-in-a-docker-container-when-using-cgroupv2-cgroupns-priva).

And from version 4.4.2 / 2022-01-13 release notes:

Added a deprecated option to settings.json: "deprecatedCgroupv1": true, which switches the Linux environment back to cgroups v1. If your software requires cgroups v1, you should update it to be compatible with cgroups v2. Although cgroups v1 should continue to work, it is likely that some future features will depend on cgroups v2. It is also possible that some Linux kernel bugs will only be fixed with cgroups v2.

staticdev avatar Apr 29 '22 20:04 staticdev