network icon indicating copy to clipboard operation
network copied to clipboard

On dnf5 systems (e.g. Fedora 41) python3-rpm required + error message not descriptive

Open felixhowe opened this issue 10 months ago • 11 comments

If one installs linux-system-roles and then attempts to use linux-system-roles.network in a playbook, the play fails with the following error:

TASK [linux-system-roles.network : Check which packages are installed] *********
fatal: [localhost]: FAILED! => {"censored": "the output has been hidden due to
the fact that 'no_log: true' was specified for this result", "changed": false}

This bug was introduced in Fedora 41 and is probably related to the change to the dnf5 package manager in 41; linux-system-roles.network worked as expected immediately after install on Fedora 40.

To reproduce on a Fedora 41 machine, see attached minimal example playbook:

lsr.network_needs_python3-rpm.yml.txt

Specifically, the "no_log" value causing the "censored" error message is set in the following file:

/usr/share/ansible/roles/linux-system-roles.network/tasks/set_facts.yml

Commenting out "no_log: true" on the last line of that file (under "Check which packages are installed") changes the error message to a more descriptive one:

TASK [linux-system-roles.network : Check which packages are installed] *********
[WARNING]: Found "rpm" but Failed to import the required Python library (rpm) on
tdc-testvm's Python /usr/bin/python3. Please read the module documentation and
install it in the appropriate location. If the required library is installed,
but Ansible is using the wrong Python interpreter, please consult the
documentation on ansible_python_interpreter

If you install python3-rpm manually via:

dnf install python3-rpm

The playbook then executes normally and all tasks succeed.

Note that I'm filing this here instead of with Red Hat / Fedora because as any target remote hosts also need that package, it should probably be auto-installed by linux-system-roles.network when required.

felixhowe avatar Feb 27 '25 22:02 felixhowe

This looks like https://github.com/ansible/ansible/issues/84206

I'm talking with the Ansible team to see if adding auto_install_module_deps is appropriate, or if there is some other workaround. Note that this change will affect every single code that uses package_facts everywhere not just the single network system role, so I'm trying to find a way to address this issue appropriately given the scope of the problem.

richm avatar Feb 27 '25 23:02 richm

@felixhowe hm - on the other hand - fatal: [localhost]: - this is probably https://access.redhat.com/solutions/6726561 - in which case you'll need to use one of the workarounds:

Choose one of the options below to workaround the issue:

    Create an inventory file that lists localhost with the ansible_connection=local option.
        For example, an inventory file with:
            localhost  ansible_connection=local
        Run ansible-playbook and specify that this inventory file should be used:
            ansible-playbook -i inventory <playbook>

    Create an inventory file that lists localhost.
        Note that this will result in ansible-playbook connecting to the localhost over SSH with SSH key authentication, which must have previously been configured.
        For example, an inventory file with:
            localhost
        Run ansible-playbook and specify that this inventory file should be used:
            ansible-playbook -i inventory <playbook>

    Use implicit localhost, with the ansible_python_interpreter variable set to use platform-python
        For example:
            ansible-playbook  <playbook>  -e 'ansible_python_interpreter=/usr/libexec/platform-python'

richm avatar Feb 27 '25 23:02 richm

This looks like https://github.com/ansible/ansible/issues/84206

I think the linked issue is already fixed in the latest Fedora - I was not able to reproduce that error; the following worked:

(new, clean Fedora 41 minimal install)

dnf install ansible
cat > test.yml
- name: see whether python3-libdnf5 issue still present
  hosts: localhost
  tasks:

  - name: update all packages
    ansible.builtin.dnf:
      name: '*'
      state: latest
<ctrl+d>
ansible-playbook test.yml

and indeed, after running the above, the python3-libdnf5 package was present:

# dnf list --installed | grep python3-libdnf5
python3-libdnf5.x86_64                5.2.10.0-2.fc41             updates

on the other hand - fatal: [localhost]: - this is probably https://access.redhat.com/solutions/6726561 -

I don't think so - I tested the example playbook both on localhost and a remote one and got exactly the same behaviour on both, i.e. on the local host, installing python3-rpm also immediately fixes the issue.

I haven't tried this on RHEL yet though, only Fedora. Will try that this evening.

felixhowe avatar Feb 28 '25 01:02 felixhowe

ok - thanks - we didn't see this in our testing because the images provided by our testing framework (https://packit.dev/docs/configuration/upstream/tests) have python3-rpm pre-installed - will need to figure out the best way to fix, and then how to create a regression test for this

richm avatar Feb 28 '25 15:02 richm

@felixhowe on a minimal system - do you have any of these installed? python3-dnf python3-tracer python3-libdnf5 ?

The reason I'm asking - I'm writing a test to do dnf -y remove python3-rpm before the network role - this is what happens:

Package                            Arch   Version         Repository                            Size
Removing:
 python3-rpm                       x86_64 4.20.0-1.fc41   281c6d6fc60143e1aaaaa9f7f140cd6d 175.3 KiB
Removing dependent packages:
 python3-dnf                       noarch 4.22.0-2.fc41   b941836037824982ac2fb4a9202c2f17   2.6 MiB
 python3-dnf-plugin-tracer         noarch 4.1.2-3.fc41    281c6d6fc60143e1aaaaa9f7f140cd6d   8.8 KiB
Removing unused dependencies:
 dnf-data                          noarch 4.22.0-2.fc41   b941836037824982ac2fb4a9202c2f17  38.6 KiB
 hiredis                           x86_64 1.2.0-3.fc41    281c6d6fc60143e1aaaaa9f7f140cd6d 110.1 KiB
 ima-evm-utils-libs                x86_64 1.6.2-2.fc41    281c6d6fc60143e1aaaaa9f7f140cd6d  60.8 KiB
 libcomps                          x86_64 0.1.21-4.fc41   b941836037824982ac2fb4a9202c2f17 206.2 KiB
 libdnf                            x86_64 0.73.4-2.fc41   b941836037824982ac2fb4a9202c2f17   2.1 MiB
 libfsverity                       x86_64 1.6-1.fc41      281c6d6fc60143e1aaaaa9f7f140cd6d  32.6 KiB
 python3-dbus                      x86_64 1.3.2-8.fc41    281c6d6fc60143e1aaaaa9f7f140cd6d 520.7 KiB
 python3-distro                    noarch 1.9.0-5.fc41    281c6d6fc60143e1aaaaa9f7f140cd6d 198.7 KiB
 python3-dnf-plugins-extras-common noarch 4.1.2-3.fc41    281c6d6fc60143e1aaaaa9f7f140cd6d  96.8 KiB
 python3-hawkey                    x86_64 0.73.4-2.fc41   b941836037824982ac2fb4a9202c2f17 297.4 KiB
 python3-libcomps                  x86_64 0.1.21-4.fc41   b941836037824982ac2fb4a9202c2f17 140.8 KiB
 python3-libdnf                    x86_64 0.73.4-2.fc41   b941836037824982ac2fb4a9202c2f17   3.8 MiB
 python3-libdnf5                   x86_64 5.2.10.0-2.fc41 b941836037824982ac2fb4a9202c2f17   8.1 MiB
 python3-psutil                    x86_64 5.9.8-4.fc41    281c6d6fc60143e1aaaaa9f7f140cd6d   1.4 MiB
 python3-six                       noarch 1.16.0-23.fc41  281c6d6fc60143e1aaaaa9f7f140cd6d 118.3 KiB
 python3-tracer                    noarch 1.2-1.fc41      b941836037824982ac2fb4a9202c2f17 406.7 KiB
 python3-unbound                   x86_64 1.22.0-14.fc41  b941836037824982ac2fb4a9202c2f17 522.9 KiB
 rpm-plugin-systemd-inhibit        x86_64 4.20.0-1.fc41   281c6d6fc60143e1aaaaa9f7f140cd6d  16.3 KiB
 rpm-sign-libs                     x86_64 4.20.0-1.fc41   281c6d6fc60143e1aaaaa9f7f140cd6d  39.4 KiB
 tracer-common                     noarch 1.2-1.fc41      b941836037824982ac2fb4a9202c2f17  33.5 KiB
 unbound-anchor                    x86_64 1.22.0-14.fc41  b941836037824982ac2fb4a9202c2f17  57.5 KiB
 unbound-libs                      x86_64 1.22.0-14.fc41  b941836037824982ac2fb4a9202c2f17   1.4 MiB

Transaction Summary:
 Removing:          25 packages

That is an awful lot of dependencies (including nested/indirect) on python3-rpm . . . so I'm wondering if the process of creating a minimal system removes all of these? Or somehow removes python3-rpm and some others, and somehow leaves some of these?

richm avatar Feb 28 '25 19:02 richm

dnf5 in Fedora is currently a bit... overly aggressive (I would argue to the point of being broken sometimes) with auto-removals. I often have to do --no-autoremove just to avoid having it remove half the system :)

There are no removals involved in the minimal install, however; see bottom for full explanation.

That said, yes, almost all of those packages are indeed absent from the minimal system's default state. Here's the full list of what's present/absent:

dnf-data                           absent
hiredis                            absent
ima-evm-utils-libs                 absent
libcomps                           absent
libdnf                             absent
libfsverity                        absent
python3-dbus                       present
python3-distro                     absent
python3-dnf                        absent
python3-dnf-plugin-tracer          absent
python3-dnf-plugins-extras-common  absent
python3-hawkey                     absent
python3-libcomps                   absent
python3-libdnf                     absent
python3-libdnf5                    absent
python3-psutil                     absent
python3-six                        absent
python3-tracer                     absent
python3-unbound                    absent
rpm-plugin-systemd-inhibit         absent
rpm-sign-libs                      absent
tracer-common                      absent
unbound-anchor                     absent
unbound-libs                       absent

(How I got this: copied the package name lines from your "would remove" output, put them in checkfor.txt on the minimal system, did s/ .*$// to it, then dnf install $(cat checkfor.txt), copied the output from that & compared, then marked matches as "absent" and things not listed as "present".)

Interestingly, I deliberately left python3-rpm out of checkfor.txt and one of the other things pulled it in as a dependency. Specifically, it looks like dnf install python3-dnf will (now?) result in python3-rpm also getting installed. (I'm not sure if that's helpful or not - only if by coincidence installing python3-dnf happens to fix some other issue and makes this one simpler, I guess.)

Lots more detail about what I mean by "minimal system":

The minimal system template I use is the result of using the current Fedora 41 netinstall ISO with the attached anaconda config (ks.cfg.gz - note that I've redacted some things, and gzipped it because apparently .cfg is somehow a "dangerous file type" in the Microsoft world), so there are no package removals involved in creating the template. Specifically, the creation process goes like this:

  1. Create a VM with two DVD drives
  2. Set the first one to the Fedora 41 Server ISO (Fedora-Server-netinst-x86_64-41-1.4.iso last time I regenerated it)
  3. Set the second one to an ISO containing only the ks.cfg file in its top level directory
  4. On boot, hit e for edit and add the option "inst.ks=cdrom" to the kernel options line
  5. Wait for the non-interactive install to complete, then shut down the VM and its disk becomes the new template.

(There is also a PXE version of the process that's fully automated, but the above version is easier to describe and they produce the same result.)

In theory, you would get exactly the same package selections if you:

  1. Create a blank VM with one DVD drive
  2. Attach the Fedora 41 ISO
  3. In the "Software Selection" step, choose only "Fedora Custom Operating System"

When an image is deployed from the template, it then gets configured by Ansible (either by remote, or by installing ansible on the new machine itself and pulling the relevant playbooks for use in localhost mode, depending where it's going).

In case it's useful, I've attached two outputs of dnf list --installed - one from the template's initial state (which lags behind what you'd get with a brand new install, because it only gets regenerated every month or so) and one after a dnf upgrade.

installed_initial.txt installed_after_dnf_upgrade.txt

felixhowe avatar Mar 01 '25 21:03 felixhowe

@felixhowe This is excellent - thanks! I would say though that this is an Ansible problem, not a system roles problem, since this issue affects every Ansible user wanting to use package_facts or package on a minimal dnf5 system. Ansible should add python3-rpm and the other packages to their documentation of the minimal required software on a managed node. There has to be some minimum list of packages required to be installed (and configured) on managed nodes in order for Ansible to operate. For example, sshd must be installed and configured to allow the Ansible user to use ssh public key auth - sudo must be installed and configured to allow become access - etc. If python3-rpm is not listed as such in the Ansible documentation for managed node provisioning, that seems like an Ansible issue. That being said - we can have the system roles install the missing packages, but it doesn't seem like a function of system roles to install missing Ansible dependencies, unless the issue only affects system roles.

richm avatar Mar 03 '25 17:03 richm

@richm , I've had some time to try this out on a few more systems/distros, and I now mostly agree (though maybe conditionally - see very bottom :) ) -

I would say though that this is an Ansible problem, not a system roles problem, since this issue affects every Ansible user wanting to use package_facts or package

I didn't realise until exploring this further that I/we were just "getting lucky" on older Red Hat & Fedora systems, in that package_facts is not expected to automatically install its own dependencies on targeted systems (and the documentation even implies that it won't, but could be more complete about this). It just happened that on older dnf/yum, python3-rpm was considered a dependency of dnf itself, so it was always there - even with my minimal template, on the Fedora 40 version it's there "out of the box" and difficult/impossible to remove:

# dnf remove python3-rpm
Error:
 Problem: The operation would result in removing the following protected packages: dnf

If python3-rpm is not listed as such in the Ansible documentation for managed node provisioning, that seems like an Ansible issue.

Agreed - I think this "bug" becomes a documentation issue; this page:

https://docs.ansible.com/ansible/latest/collections/ansible/builtin/package_facts_module.html

currently has the following under the comments for the manager parameter:

Choices:

"apk": Alpine Linux package manager

"apt": For DEB based distros, python-apt package must be installed on targeted hosts

"auto" (default): Depending on strategy, will match the first or all package managers provided, in order

"dnf": Alias to rpm

(emphasis on apt line mine)

Ansible upstream needs to decide whether they want to include a similar note for python3-rpm for all the dnf/rpm-based distros, or whether they want to bug the dnf maintainers about having python3-rpm be considered a hard dependency of dnf again :)

However, regarding:

we can have the system roles install the missing packages, but it doesn't seem like a function of system roles to install missing Ansible dependencies, unless the issue only affects system roles.

one of the factors that led me to report this here first was that system roles does have precedent for installing things as needed; I took my cue from how it behaves when things like linux-system-roles.selinux are used for the first time on a new host - namely, there are tasks called "Install SELinux python3 tools" and "Install SELinux tool semanage" in that role.

So if upstream does decide to change the documentation to say " python3-rpm package must be installed on targeted hosts", would linux-system-roles respond to that by adding a "Install python3-rpm" task ahead of anything that uses package_facts? Or add python3-rpm as a dependency of its package? Or add a documentation note to the user, that the user would have to find the first time they use a module that needs it? (I haven't run into any other cases yet where linux-system-roles just errors out in a way that needs to be investigated.)

I don't know what the "right" answer is; this has just got me thinking about whether there's a consistent set of conventions around "just quietly do what's needed to accomplish the task" versus "modify the target systems as little as possible" - and where such a philosophy question should best be defined (I don't know that, either, and have several competing opinions just in my own head) :)

felixhowe avatar Mar 04 '25 21:03 felixhowe

For completeness, a couple more/updated minimal test cases:

test-package_facts.yml.txt

lsr.auto_installs_semanage.yml.txt

felixhowe avatar Mar 04 '25 21:03 felixhowe

@richm , I've had some time to try this out on a few more systems/distros, and I now mostly agree (though maybe conditionally - see very bottom :) ) -

I would say though that this is an Ansible problem, not a system roles problem, since this issue affects every Ansible user wanting to use package_facts or package

I didn't realise until exploring this further that I/we were just "getting lucky" on older Red Hat & Fedora systems, in that package_facts is not expected to automatically install its own dependencies on targeted systems (and the documentation even implies that it won't, but could be more complete about this). It just happened that on older dnf/yum, python3-rpm was considered a dependency of dnf itself, so it was always there - even with my minimal template, on the Fedora 40 version it's there "out of the box" and difficult/impossible to remove:

# dnf remove python3-rpm
Error:
 Problem: The operation would result in removing the following protected packages: dnf

If python3-rpm is not listed as such in the Ansible documentation for managed node provisioning, that seems like an Ansible issue.

Agreed - I think this "bug" becomes a documentation issue; this page:

https://docs.ansible.com/ansible/latest/collections/ansible/builtin/package_facts_module.html

currently has the following under the comments for the manager parameter:

Choices: "apk": Alpine Linux package manager "apt": For DEB based distros, python-apt package must be installed on targeted hosts "auto" (default): Depending on strategy, will match the first or all package managers provided, in order "dnf": Alias to rpm

(emphasis on apt line mine)

Ansible upstream needs to decide whether they want to include a similar note for python3-rpm for all the dnf/rpm-based distros, or whether they want to bug the dnf maintainers about having python3-rpm be considered a hard dependency of dnf again :)

Yes. I think they should change the docs to say that python3-rpm is a hard requirement for using package_facts on dnf5 systems.

However, regarding:

we can have the system roles install the missing packages, but it doesn't seem like a function of system roles to install missing Ansible dependencies, unless the issue only affects system roles.

one of the factors that led me to report this here first was that system roles does have precedent for installing things as needed; I took my cue from how it behaves when things like linux-system-roles.selinux are used for the first time on a new host - namely, there are tasks called "Install SELinux python3 tools" and "Install SELinux tool semanage" in that role.

That's different because we aren't trying to work around a missing dependency in Ansible itself - those tools are only needed for specific use cases of the selinux system role. Ansible itself does not need the python selinux libraries - they refactored (quite some time ago) the core code in the file and related modules to use the C libraries (which are virtually always present even on minimal systems) via the python-to-C bindings to manage SELinux policy on specified files/directories.

So if upstream does decide to change the documentation to say " python3-rpm package must be installed on targeted hosts", would linux-system-roles respond to that by adding a "Install python3-rpm" task ahead of anything that uses package_facts? Or add python3-rpm as a dependency of its package? Or add a documentation note to the user, that the user would have to find the first time they use a module that needs it? (I haven't run into any other cases yet where linux-system-roles just errors out in a way that needs to be investigated.)

I suppose you could make an argument that, since the system roles provides as its public API a role, and not a module, it is the responsibility of the role to ensure that all of the dependencies are present, since the user of the network role should not have to know that the role is using the package_facts module on a dnf5 system, and therefore the user must ensure the presence of the python3-rpm library on the managed nodes. And if we have to add this to the README in such a way that it makes it unambiguously clear, it is just a small step to adding this to the role code.

It just sticks in my craw that all role developers, not just linux-system-roles, will have to figure out how to make this change to any role that uses package_facts and wants to manage dnf5 systems - that this is an additional burden placed on Ansible code authors because of a shortcoming in Ansible.

In the case of system roles, for the sake of consistency and future-proofing even for the roles that do not currently use package_facts, we will probably have to make this change to all of the system roles.

OTOH, I don't know of a better place to do this. Requiring all playbook authors/users to do this is quite a bit larger burden than for a couple of hundred roles.

IMO the best place to do this would be at image provisioning time - just as you know you need to have sshd and a handful of other dependencies installed on any image you want to manage with Ansible, this would be one more package. But

  • the person doing the provisioning might not be the person doing the managing
  • you might have to use "stock" images and have no way to provision, and just rely on the fact that sshd and cloud-init are virtually ubiquitous
  • you still need some logic somewhere to determine if the image is for a dnf5-using system, and to install python3-rpm if so - and roles are a great way to encapsulate platform logic such as that

I don't know what the "right" answer is; this has just got me thinking about whether there's a consistent set of conventions around "just quietly do what's needed to accomplish the task" versus "modify the target systems as little as possible" - and where such a philosophy question should best be defined (I don't know that, either, and have several competing opinions just in my own head) :)

I don't either - I presented a few options above, but I'm sure there are more. We're going to have to have a discussion among our team and other teams that develop Ansible roles and figure this out.

richm avatar Mar 04 '25 21:03 richm

It just sticks in my craw that all role developers, not just linux-system-roles, will have to figure out how to make this change to any role that uses package_facts and wants to manage dnf5 systems - that this is an additional burden placed on Ansible code authors because of a shortcoming in Ansible.

Yes; I pretty much feel the same way about this. I suspect what will happen is this:

Red Hat will eventually move their main Enterprise distribution to dnf5. At that time, they'll run into this themselves en masse, and python3-rpm will become a dependency of dnf, and the problem will go away.

...but that will take a while, and in the mean time, all of us will have to put in "temporary" (in the "IT temporary" meaning of the word :P) workarounds, like:

In the case of system roles, for the sake of consistency and future-proofing even for the roles that do not currently use package_facts, we will probably have to make this change to all of the system roles.

And I'm likely to need to do that with a lot of our internal roles as well, peppering them with TODOs for "remove this when the situation changes". And many of those workarounds will be forgotten about by the time the situation has changed, and probably won't be cleaned up until some distant future additional change breaks something unrelated but proximate in the code ;)

Thanks for all your attention to this - honestly, I'd probably have settled for a removal of all the occurrences of no log: true and being told to figure it out myself :P

felixhowe avatar Mar 05 '25 01:03 felixhowe