content icon indicating copy to clipboard operation
content copied to clipboard

Remediating CIS via Ansible on RHEL-10 leads to broken D-BUS

Open comps opened this issue 1 year ago • 1 comments

Description of problem:

Unfortunately, I don't have a solution, so the following is just a series of notes from my incomplete investigation.

Remediating a pre-built (by CaC/content build system) CIS playbook (any cis* profile) on RHEL-10 leads to Ansible failing to configure fapolicyd

...
TASK [Configure Firewalld to Restrict Loopback Traffic - Ensure firewalld Package is Installed] ***
ok: [localhost] => (item=firewalld) => {"ansible_loop_var": "item", "changed": false, "item": "firewalld", "msg": "Nothing to do", "rc": 0, "results": []}

TASK [Configure Firewalld to Restrict Loopback Traffic - Collect Facts About System Services] ***
ok: [localhost] => (Redacted by Contest)

TASK [Configure Firewalld to Restrict Loopback Traffic - Ensure firewalld trusted Zone Restricts IPv4 Loopback Traffic] ***
skipping: [localhost] => {"changed": false, "false_condition": "ansible_facts.services['firewalld.service'].state == 'running'", "skip_reason": "Conditional result was False"}

TASK [Configure Firewalld to Restrict Loopback Traffic - Ensure firewalld trusted Zone Restricts IPv6 Loopback Traffic] ***
skipping: [localhost] => {"changed": false, "false_condition": "ansible_facts.services['firewalld.service'].state == 'running'", "skip_reason": "Conditional result was False"}

TASK [Configure Firewalld to Restrict Loopback Traffic - Ensure firewalld Changes are Applied] ***
skipping: [localhost] => {"changed": false, "false_condition": "ansible_facts.services['firewalld.service'].state == 'running'", "skip_reason": "Conditional result was False"}

TASK [Configure Firewalld to Restrict Loopback Traffic - Informative Message Based on Service State] ***
fatal: [localhost]: FAILED! => {
2024-07-19 13:55:31 test.py:30: lib.waive.collect_waivers:141: using /var/tmp/runcontest-results/task1/plans/default/discover/default-0/tests/conf/waivers for waiving
2024-07-19 13:55:31 test.py:30: lib.results.report_plain:182: ERROR playbook: Configure Firewalld to Restrict Loopback Traffic - Informative Message Based on Service State ({)
    "assertion": "ansible_facts.services['firewalld.service'].state == 'running'",
    "changed": false,
    "evaluated_to": false,
    "msg": [
        "firewalld service is not active. Remediation aborted!",
        "This remediation could not be applied because it depends on firewalld service running.",
        "The service is not started by this remediation in order to prevent connection issues."
    ]

This is possibly because it's not running - the remediation was run on a Beaker (internal) system where firewalld is disabled by default, but earlier playbook tasks should have enabled it (as they do on RHEL-8/9).

I tried unselecting

  • firewalld_loopback_traffic_restricted
  • firewalld_loopback_traffic_trusted

to progress further, which stopped on

TASK [NetworkManager Deactivate Wireless Network Interfaces] *******************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["nmcli", "radio", "wifi", "off"], "delta": "0:00:00.006275", "end": "2024-07-18 20:33:46.473674", "msg": "non-zero return code", "rc": 8, "start": "2024-07-18 20:33:46.467399", "stderr": "Error: NetworkManager is not running.", "stderr_lines": ["Error: NetworkManager is not running."], "stdout": "", "stdout_lines": []}

That gave me a clue, and investigating further on the OS revealed that, indeed, networking is down after a remediation + reboot, due to NM failing to start due to an unfulfilled systemd dependency - dbus.service. It turns out dbus-broker.service failed to start on

dbus-broker-launch[625]: ERROR launcher_run_child @ ../src/launch/launcher.c +326: Permission denied
dbus-broker-launch[624]: ERROR service_add @ ../src/launch/service.c +1011: Transport endpoint is not connected
dbus-broker-launch[624]:       launcher_add_services @ ../src/launch/launcher.c +805
dbus-broker-launch[624]:       launcher_run @ ../src/launch/launcher.c +1416
dbus-broker-launch[624]:       run @ ../src/launch/main.c +152
dbus-broker-launch[624]:       main @ ../src/launch/main.c +178
dbus-broker-launch[624]: Exiting due to fatal error: -107

I asked the systemd people, but got no reply back so far, so I started digging into the source code - I downloaded and rpmbuild-patched the same version of systemd, and discovered that src/launch/launcher.c line 326 is the error_origin() in

        r = sd_id128_get_machine(&machine_id);
        if (r < 0) {
                r = error_origin(r);
                goto exit;
        }

which means that sd_id128_get_machine() failed on EPERM. I then looked into systemd source itself to see what the function does, and it basically just reads /etc/machine-id:

_public_ int sd_id128_get_machine(sd_id128_t *ret) {
        static thread_local sd_id128_t saved_machine_id = {};
        int r;

        if (sd_id128_is_null(saved_machine_id)) {
                r = id128_read("/etc/machine-id", ID128_FORMAT_PLAIN | ID128_REFUSE_NULL, &saved_machine_id);
                if (r < 0)
                        return r;
        }

        if (ret)
                *ret = saved_machine_id;
        return 0;
}

but that makes no sense - all it does is to, in essence, read the world-readable file:

$ ls -l /etc/machine-id 
-r--r--r--. 1 root root 33 Jul 18 19:21 /etc/machine-id
$ cat /etc/machine-id 
ccbf9a653ac84fe6bc0d6b40a0a49167
$ ausearch -m avc | grep machine-id
$ 

Disabling SELinux also didn't fix it, /etc has the usual 0755, so there should be no issues accessing that file. No file ACLs either.

I also looked into src/launch/service.c line 1011 (given that it was mentioned) and there isn't anything conclusive either:

        r = sd_bus_call_method(launcher->bus_controller,
                               NULL,
                               "/org/bus1/DBus/Broker",
                               "org.bus1.DBus.Broker",
                               "AddName",
                               NULL,
                               NULL,
                               "osu",
                               object_path,
                               service->name,
                               service->data->uid);
        if (r < 0)
                return error_origin(r);

Googling around, it seems that Transport endpoint is not connected (ENOTCONN) is a frequent return from the function when something below goes wrong.

At that point, I gave up.

Independently, I tried remediating the same playbook after setenforce 0, and it at least went through all rules (didn't stop on firewalld or NM), but the networking was dead anyway, so it did not fix the issue.

SCAP Security Guide Version:

703fb11c94d60366da108e02f8bf21b7fae87a81

Operating System Version:

RHEL-10

comps avatar Jul 19 '24 18:07 comps

For the record - I did review the rules remediated by Ansible up to the point of it stopping for firewalld, but there was nothing obviously responsible - just some account password policy setting, /etc/login.defs, etc., no sysctls or anything.

comps avatar Jul 19 '24 19:07 comps

I just ran CIS Level 1 playbook on RHEL 10 and this doesn't seem to be an issue anymore.

Mab879 avatar Jan 23 '25 19:01 Mab879

I tried to reproduce it on master 2edb02336a89cbf2b339fa58230ca7cf72148e03 and I failed to do so. I used CIS level 2 playbook.

vojtapolasek avatar Jan 24 '25 07:01 vojtapolasek

So this issue has a relatively high google search result for "launcher_run_child" "permission denied".

In my case the problem was that somehow / had become mode 0700 - denying non-root users the ability to read it at all. And when trying to read /etc/machine-id the first step is to try to traverse / of course. And it just so happens that dbus-broker is one of the first things to switch to drop some capabilities or run as nonroot (in this case I think it's dropping CAP_DAC_OVERRIDE) so it will fail to read /, getting this error.

cgwalters avatar Mar 04 '25 19:03 cgwalters