docker-centos7-ansible icon indicating copy to clipboard operation
docker-centos7-ansible copied to clipboard

systemd does not work when host has cgroup2

Open lostiniceland opened this issue 3 years ago • 24 comments

Currently any systemd operation fails on this image. The issue is probably linked to this one https://github.com/ansible/ansible/issues/71528

There seems to be some changes on systemd-side which had some follow-up changes in Ansible, but even after updating to 2.10.4 the error persists. I can only assume that the additional modifications to make systemd work inside a container have to be adjusted...

lostiniceland avatar Jan 08 '21 15:01 lostiniceland

I've just hit this issue too. @lostiniceland did you ever find a fix / workaround?

ollie1 avatar May 11 '21 14:05 ollie1

I haven't encountered this issue on my CI images in GitHub Actions, and verified things are working locally too... can you give an example to reproduce the issues you're seeing?

geerlingguy avatar May 11 '21 15:05 geerlingguy

Very possibly I'm doing something wrong, but if you can spot what I'd be very grateful!

Here is a minimal example (assuming standard role structure generated with molecule init role docker-centos7-ansible --driver-name docker):

molecule.yml

dependency:
  name: galaxy
driver:
  name: docker
platforms:
  - name: instance
    image: geerlingguy/docker-centos7-ansible
    pre_build_image: true
    privileged: true
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
provisioner:
  name: ansible
verifier:
  name: ansible

converge.yml

- name: Converge
  hosts: all
  tasks:
    - name: Install firewalld
      package:
        name: firewalld
        state: present
    - name: Enable and start firewalld
      service:
        name: firewalld
        state: started

Running molecule converge works fine until it gives the following error:

PLAY [Converge] ****************************************************************

TASK [Gathering Facts] *********************************************************
ok: [instance]

TASK [Install firewalld] *******************************************************
changed: [instance]

TASK [Enable and start firewalld] **********************************************
fatal: [instance]: FAILED! => {"changed": false, "msg": "Service is in unknown state", "status": {}}

PLAY RECAP *********************************************************************
instance                   : ok=2    changed=1    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

Execing into the container and running systemctl status firewalld gives

[root@instance /]# systemctl status firewalld
Failed to get D-Bus connection: Operation not permitted

ollie1 avatar May 11 '21 16:05 ollie1

@ollie1 - You're missing the command override—molecule injects a command that needs to be removed for systemd to be the first process in the container; see https://github.com/geerlingguy/ansible-role-apache/blob/master/molecule/default/molecule.yml#L9

command: ${MOLECULE_DOCKER_COMMAND:-""}

geerlingguy avatar May 11 '21 16:05 geerlingguy

Same issue

TASK [Enable and start firewalld] **********************************************                                                                                                               
fatal: [instance]: FAILED! => {
    "changed": false,
    "cmd": "/usr/bin/systemctl",
    "invocation": {
        "module_args": {
            "daemon_reexec": false,
            "daemon_reload": false,
            "enabled": null,
            "force": null,
            "masked": null,
            "name": "firewalld",
            "no_block": false,
            "scope": "system",
            "state": "started"
        }
    },
    "msg": "Failed to get D-Bus connection: No such file or directory",
    "rc": 1,
    "stderr": "Failed to get D-Bus connection: No such file or directory\n",
    "stderr_lines": [
        "Failed to get D-Bus connection: No such file or directory"
    ],
    "stdout": "",
    "stdout_lines": []
}

It is certainly from operating system.

Linux va 5.12.2-arch1-1 #1 SMP PREEMPT Fri, 07 May 2021 15:36:06 +0000 x86_64 GNU/Linux

Vakhrushev avatar May 11 '21 20:05 Vakhrushev

@Vakhrushev - Did you modify your molecule config to override the command as I mentioned above?

geerlingguy avatar May 11 '21 20:05 geerlingguy

Yes. Do it from zero.

vls@va:~/tmp                                                                                                                                                                                            > git clone https://github.com/geerlingguy/ansible-role-apache
Cloning into 'ansible-role-apache'...
remote: Enumerating objects: 1097, done.                                                            
remote: Counting objects: 100% (6/6), done.                                                                                                                                                             remote: Compressing objects: 100% (6/6), done.                                                                                                                                                          
remote: Total 1097 (delta 0), reused 2 (delta 0), pack-reused 1091
Receiving objects: 100% (1097/1097), 171.87 KiB | 215.00 KiB/s, done.           
Resolving deltas: 100% (566/566), done.                                                             

vls@va:~/tmp                                                                                        
> cd ansible-role-apache/         

vls@va:~/tmp/ansible-role-apache
> molecule create                                                                                                                                                                      master [5b2e65d]
INFO     default scenario test matrix: dependency, create, prepare              
INFO     Performing prerun...                                                                                                                                                                           INFO     Using .cache/roles/geerlingguy.apache symlink to current repository in order to enable Ansible to find the role using its expected full name.                                                  INFO     Added ANSIBLE_ROLES_PATH=~/.ansible/roles:/usr/share/ansible/roles:/etc/ansible/roles:./.cache/roles
INFO     Running default > dependency
WARNING  Skipping, missing the requirements file.                                                   
WARNING  Skipping, missing the requirements file.                                                                                                                                                       
INFO     Running default > create
INFO     Sanity checks: 'docker'  
                                                                                                    
PLAY [Create] ******************************************************************
                                                  
TASK [Log into a Docker registry] **********************************************                                                                                                                        
skipping: [localhost] => (item={'command': '', 'image': 'geerlingguy/docker-centos7-ansible:latest', 'name': 'instance', 'pre_build_image': True, 'privileged': True, 'volumes': ['/sys/fs/cgroup:/sys/f
s/cgroup:ro']})              
                                                                                                    
TASK [Check presence of custom Dockerfiles] ************************************
ok: [localhost] => (item={'command': '', 'image': 'geerlingguy/docker-centos7-ansible:latest', 'name': 'instance', 'pre_build_image': True, 'privileged': True, 'volumes': ['/sys/fs/cgroup:/sys/fs/cgro
up:ro']})

TASK [Create Dockerfiles from image names] *************************************
skipping: [localhost] => (item={'command': '', 'image': 'geerlingguy/docker-centos7-ansible:latest', 'name': 'instance', 'pre_build_image': True, 'privileged': True, 'volumes': ['/sys/fs/cgroup:/sys/f
s/cgroup:ro']})

TASK [Discover local Docker images] ********************************************
ok: [localhost] => (item={'changed': False, 'skipped': True, 'skip_reason': 'Conditional result was False', 'item': {'command': '', 'image': 'geerlingguy/docker-centos7-ansible:latest', 'name': 'insta
nce', 'pre_build_image': True, 'privileged': True, 'volumes': ['/sys/fs/cgroup:/sys/fs/cgroup:ro']}, 'ansible_loop_var': 'item', 'i': 0, 'ansible_index_var': 'i'})

TASK [Build an Ansible compatible image (new)] *********************************
skipping: [localhost] => (item=molecule_local/geerlingguy/docker-centos7-ansible:latest) 

TASK [Create docker network(s)] ************************************************

TASK [Determine the CMD directives] ********************************************
ok: [localhost] => (item={'command': '', 'image': 'geerlingguy/docker-centos7-ansible:latest', 'name': 'instance', 'pre_build_image': True, 'privileged': True, 'volumes': ['/sys/fs/cgroup:/sys/fs/cgro
up:ro']})

TASK [Create molecule instance(s)] *********************************************
changed: [localhost] => (item=instance)


TASK [Wait for instance(s) creation to complete] *******************************
FAILED - RETRYING: Wait for instance(s) creation to complete (300 retries left).
changed: [localhost] => (item={'started': 1, 'finished': 0, 'ansible_job_id': '584130753297.11722', 'results_file': '/home/vls/.ansible_async/584130753297.11722', 'changed': True, 'failed': False, 'it
em': {'command': '', 'image': 'geerlingguy/docker-centos7-ansible:latest', 'name': 'instance', 'pre_build_image': True, 'privileged': True, 'volumes': ['/sys/fs/cgroup:/sys/fs/cgroup:ro']}, 'ansible_l
oop_var': 'item'})

PLAY RECAP *********************************************************************
localhost                  : ok=5    changed=2    unreachable=0    failed=0    skipped=4    rescued=0    ignored=0

INFO     Running default > prepare
WARNING  Skipping, prepare playbook not configured.

vls@va:~/tmp/ansible-role-apache
> molecule login                                                                                                                                                                       master [5b2e65d]
INFO     Running default > login
[root@instance /]# systemctl 
Failed to get D-Bus connection: No such file or directory

> docker --version                                                                                                                                                                     master [5b2e65d]
Docker version 20.10.6, build 370c28948e

> docker-compose version                                                                                                                                                               master [5b2e65d]
docker-compose version 1.29.2, build unknown
docker-py version: 5.0.0
CPython version: 3.9.5
OpenSSL version: OpenSSL 1.1.1k  25 Mar 2021

> molecule --version                                                                                                                                                                   master [5b2e65d]
molecule 3.3.0 using python 3.9 
    ansible:2.11.0
    delegated:3.3.0 from molecule
    docker:0.2.4 from molecule_docker

Vakhrushev avatar May 11 '21 21:05 Vakhrushev

@Vakhrushev - I just did the exact same thing (molecule create, molecule login, then systemctl) and got:

[root@instance /]# systemctl
  UNIT                           LOAD   ACTIVE     SUB       DESCRIPTION
  dev-vda1.device                loaded activating tentative /dev/vda1
  -.mount                        loaded active     mounted   /
  dev-mqueue.mount               loaded active     mounted   POSIX Message Queue File System
  etc-hostname.mount             loaded active     mounted   /etc/hostname
  etc-hosts.mount                loaded active     mounted   /etc/hosts
  etc-resolv.conf.mount          loaded active     mounted   /etc/resolv.conf

My stats:

$ docker --version  
Docker version 20.10.5, build 55c4c88

$ docker-compose version 
docker-compose version 1.29.0, build 07737305
docker-py version: 5.0.0
CPython version: 3.9.0
OpenSSL version: OpenSSL 1.1.1h  22 Sep 2020

$ molecule --version
molecule 3.2.0 using python 3.9 
    ansible:2.10.8
    delegated:3.2.0 from molecule
    docker:0.2.4 from molecule_docker

I'll update my version of molecule to latest and see if that makes a difference.

Edit: Works the same with Molecule 3.3.0. Are you using the latest HEAD on my apache repository?

geerlingguy avatar May 11 '21 22:05 geerlingguy

I just noticed my docker-centos7-ansible image was 7 months old so I'm updating it now...

Heh... it's still that one:

$ docker images
REPOSITORY                           TAG       IMAGE ID       CREATED        SIZE
geerlingguy/docker-centos7-ansible   latest    a727967c4d1d   7 months ago   573MB

I just realized I haven't updated this repository to build off GitHub Actions yet. I should probably do that. The last time it was built was 7 months ago, back when Travis CI still worked.

geerlingguy avatar May 11 '21 22:05 geerlingguy

@geerlingguy Thank you so much - adding the command fixed it. Now it all works as expected.

ollie1 avatar May 12 '21 09:05 ollie1

My trouble with systemd 248 https://github.com/systemd/systemd/issues/19245

Downgrade to systemd 247. work for me

Vakhrushev avatar May 12 '21 15:05 Vakhrushev

@Vakhrushev did you downgrade on the Docker host or in the image?

lostiniceland avatar May 13 '21 14:05 lostiniceland

@lostiniceland downgrade systemd on host.

Vakhrushev avatar May 13 '21 16:05 Vakhrushev

Can anyone try pulling the latest version of this image (once it builds and pushes... should happen soon after #14 was merged).

geerlingguy avatar Jun 25 '21 20:06 geerlingguy

I've pulled the new image and used it for doing some postgresql testing via molecule with systemd and not noticed a problem. The new ansible version does show up too.

jhg03a avatar Jun 25 '21 22:06 jhg03a

Just noticed this problem this morning on my windows machine (ubuntu 20.04 wsl2) while I had no issues before.

$ docker --version
Docker version 20.10.6, build 370c289
	
$ docker-compose --version
docker-compose version 1.29.1, build c34c88b2

$ molecule --version
molecule 3.3.4 using python 3.8 
    ansible:2.11.2
    delegated:3.3.4 from molecule
    docker:0.2.4 from molecule_docker

The same configuration / code base works fine on Centos 7 vmware host.

$ docker --version
Docker version 20.10.7, build f0df350

$ docker-compose --version
docker-compose version 1.26.2, build eefe0d31

$ molecule --version
molecule 3.3.4 using python 3.6
    ansible:2.11.2
    delegated:3.3.4 from molecule
    docker:0.2.4 from molecule_docker

fleroux514 avatar Jul 07 '21 18:07 fleroux514

I wonder if this is a cgroup v1/v2 issue... CentOS 7 only supports cgroup v1 and consequently you cannot properly use systemd in such containers when your container host is running cgroups v2.

Here's a relevant issue from podman, it might be a similar case with docker: https://github.com/containers/podman/issues/5153

Maybe this is something those with problems can cross-check in their environments.

It that is indeed the issue, then it seems we can't do much about it, except use a different container host that (also) supports cgroups v1.

markwort avatar Aug 06 '21 19:08 markwort

I hit this and eventually found https://github.com/docker/for-mac/issues/6073

Issue is cgroups v1 vs v2. At time of comment, there are experimental builds to allow choice of cgroup version.

joshbenner avatar Dec 16 '21 18:12 joshbenner

Ah... that'd explain it. For some reason I'm still on 4.1.1.

geerlingguy avatar Dec 16 '21 22:12 geerlingguy

Here is a script to use deprecatedCgroupv1 in Docker Desktop for Mac (4.6.0 at time of writing): https://github.com/docker/for-mac/issues/6073#issuecomment-1018793677

This fixes Failed to connect to bus: No such file or directory in Molecule tests on macOS that were working in older versions of Docker Desktop for Mac.

bbaassssiiee avatar Mar 17 '22 13:03 bbaassssiiee

@lostiniceland @geerlingguy would it be okay to change the title to systemd does not work when host has cgroup2?

wookietreiber avatar Jul 19 '23 11:07 wookietreiber

Here is a script to use deprecatedCgroupv1 in Docker Desktop for Mac (4.6.0 at time of writing): docker/for-mac#6073 (comment)

This fixes Failed to connect to bus: No such file or directory in Molecule tests on macOS that were working in older versions of Docker Desktop for Mac.

Change to deprecatedCgroupv1": true in ~/Library/Group\ Containers/group.com.docker/settings.json

bbaassssiiee avatar Aug 04 '23 20:08 bbaassssiiee

Change to deprecatedCgroupv1": true in ~/Library/Group\ Containers/group.com.docker/settings.json

Is there a way to add this to molecule.yml?

wookietreiber avatar Aug 05 '23 05:08 wookietreiber

Change to deprecatedCgroupv1": true in ~/Library/Group\ Containers/group.com.docker/settings.json

Is there a way to add this to molecule.yml?

Never tried Podman inside Docker

driver:
  name: podman
platforms:
  - name: podman-in-docker
    # ... other options
    cgroup_manager: cgroupfs
    storage_opt: overlay.mount_program=/usr/bin/fuse-overlayfs
    storage_driver: overlay

bbaassssiiee avatar Aug 05 '23 05:08 bbaassssiiee