tmt Provide a simple way to get an updated guest

I am currently investigating a weird test regression of cockpit-podman packit tests on Fedora 33. I can reproduce this locally with tmt run --all provision --how virtual --image fedora-33 in cockpit-podman.

The root cause is that the test dependency installation grabs a very outdated crun package:

        package: make, cockpit-ws, cockpit-podman and 6 more
            Execute command 'rpm -q --whatprovides "make" "cockpit-ws" "cockpit-podman" "cockpit-system" "libvirt-python3" "npm" "git" "rsyn
c" "python3" || dnf install -y "make" "cockpit-ws" "cockpit-podman" "cockpit-system" "libvirt-python3" "npm" "git" "rsync" "python3"' on gue
st '127.0.0.1:10023'.
            Run command 'ssh -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -p 10023 -i /var/tmp/tmt/run-001/plans/all/provision/
default/id_rsa [email protected] rpm -q --whatprovides "make" "cockpit-ws" "cockpit-podman" "cockpit-system" "libvirt-python3" "npm" "git" "rsy
nc" "python3" || dnf install -y "make" "cockpit-ws" "cockpit-podman" "cockpit-system" "libvirt-python3" "npm" "git" "rsync" "python3"'.
            environment: None
[...]
            out: Installing:
            out:  cockpit-podman              noarch 29-1.fc33                     updates 1.0 M
[...]
            out: Installing dependencies:
            out:  crun                        x86_64 0.15-5.fc33                   fedora  156 k
[...]

full /var/tmp/tmt/run-001/log.txt

I can reproduce this with this minimal recipe:

$ tmt run provision --how virtual --image fedora-33
$ tmt run -l login

# inside the testbed
dnf install podman
[...]
Installing:
 podman                                        x86_64                   2:3.0.1-1.fc33                      updates                    12 M
Installing dependencies:
 conmon                                        x86_64                   2:2.0.26-1.fc33                     updates                    50 k
 container-selinux                             noarch                   2:2.145.0-1.fc33                    fedora                     38 k
 containernetworking-plugins                   x86_64                   0.9.1-4.fc33                        updates                   9.7 M
 containers-common                             noarch                   4:1-9.fc33                          updates                    58 k
 crun                                          x86_64                   0.15-5.fc33                         fedora                    156 k

crun 0.15.5 is 7 months old! Curiously, after that, updating works just fine:

# dnf update crun
Fedora 33 - x86_64 - Updates                                                                                 82 kB/s | 9.6 kB     00:00    
Dependencies resolved.
============================================================================================================================================
 Package                           Architecture                  Version                               Repository                      Size
============================================================================================================================================
Upgrading:
 crun                              x86_64                        0.18-5.fc33                           updates                        166 k
[...]

This smells a bit like a dnf bug, but it does not happen on Fedora 33 VMs in our own CI. Does tmt have some particular magic

I sent a workaround in https://github.com/cockpit-project/cockpit-podman/pull/700 , but this is rather dangerous -- it probably means that a lot of tests run against outdated packages?

Apr 06 '21 08:04 martinpitt

Reproduced outside of tmt e.g. podman run --rm -ti fedora:33 dnf install podman.

Update: I am not able to get usable verbose/debug data from dnf :( However if I update container image first (dnf update -y) then dnf install podman installs newer crun-0.18. Using dnf install --best podman didn't help for fresh image - it still installs old crun. FTR I'm using registry.fedoraproject.org/fedora 33 9f2a56037643

Update2: Seems that I need just to upgrade libcap and then I get new crun. podman run --rm -ti fedora:33 bash -c 'echo n | dnf install podman' -> crun-0.15-5.fc33 podman run --rm -ti fedora:33 bash -c 'dnf update -y libcap; echo n | dnf install podman' -> crun-0.18-5.fc33

Update3: Filled https://bugzilla.redhat.com/show_bug.cgi?id=1946975 for dnf ignoring --best which might be what we need in dnf install.

Apr 07 '21 09:04 lukaszachy

Thanks @lukaszachy ! Let's track this in bugzilla on the dnf side then.

Apr 07 '21 16:04 martinpitt

@martinpitt @psss @thrix What would you say if tmt does 'dnf update -y' automatically (and add an option to disable that for e.g. CI)? I'm not sure how likely is to hit dnf's bug/feature when run from an updated system, so best place (SUT is as new as possible) would be to update system after require/recommend are installed.

Apr 08 '21 07:04 lukaszachy

update system after require/recommend are installed

That's also the most expensive option -- the test installs/upgrades packages twice then. If it is enough to update the initial cloud image to work around the bug, that'd be a bit cheaper.

Apr 08 '21 08:04 martinpitt

For workaround yes. However we should fix this (as much as we can), right? IMHO running tests on old SUT is wrong and I'm not sure that we have updated images available.

Apr 08 '21 08:04 lukaszachy

Correct, I think dnf update -y right after booting the instance makes perfect sense. FTR, Debian's/Ubuntu's autopkgtest infra does that as well.

Apr 08 '21 08:04 martinpitt

With the latest discussions on the downstream bug, the dnf update -y approach seems correct indeed.

Apr 13 '21 07:04 martinpitt

Unfortunately yes. Do I understand dnf's comments correctly that we need to do it after we install require/recommend as 'install' can pick whatever version of dependency to satisfy the transaction?

Apr 13 '21 07:04 lukaszachy

In that case, dnf picked an old version of crun because an old version of libpcap was already installed. So if the testbed gets updated first, this can't happen any more.

Apr 13 '21 07:04 martinpitt

Do you understand why it happened? I couldn't find what makes libcaps update so special that after it suddenly new crun is selected.

Apr 13 '21 08:04 lukaszachy

@lukaszachy : My understanding was like this:

old crun 0.15 depends on an older pcap, which is already installed
current crun 0.18 depends on a newer pcap, which would have to be upgraded first
dnf prefers current versions over new ones even for transitive dependencies (which I find is a bug..)
thus, instead of installing current crun 0.18 and updating pcap, it installed an old crun to get along with the already installed old pcap

Apr 13 '21 08:04 martinpitt

Having the latest packages installed definitely makes sense. Although when testing using provision --how local one would probably want to skip this by default. Regarding the config I see two options:

Add an --update/no-update option to the prepare step install plugin
Support a dedicated prepare section which would be inserted by default

The first one is very easy to implement, the second one would give us some more flexibility. Especially regarding interaction with the CI. This is a bit related to preparation stories in #479 and also to the multiple step config support #68.

Apr 19 '21 15:04 psss

With the latter one, do you mean something like(?):

prepare:
    how: update

Apr 19 '21 15:04 pvalena

The separate config could/would look something like this:

prepare:
  - name: update
    how: install
    update: true
    order: 10

The default low order would mean it would be executed as one of the first prepare plugins.

Apr 19 '21 16:04 psss

For the dnf update (or alike) implementation itself, I'd be fine with --skip-broken (IOW, best-effort). I'm not sure about --nobest, which actually completely excludes versions with broken dependencies, which may have some undesired behaviour - like installing/updating to a lower package version(s), which has some runtime issues/bugs (therefore this is not used in Fedora by default, nor in Fedora Koji). Sticking with the tested version (everything that is shipped should be tested that it works), may be the best in that case. Note: I personally rarely had to used --nobest, mostly on broken systems, with already broken dependencies.

There's also dnf upgrade-minimal, the man page says: Updates each package to the latest available version that provides a bugfix, enhancement or a fix for a security issue (security). - so this would probably skip the non-security updates. We should probably do this, even if we decide we don't want to upgrade everything every time.

Apr 20 '21 10:04 pvalena

This just became a whole lot more urgent. Tests now fail in the early testbed setup phase while installing the require: packages from the test plan. As this pulls in such an old podman...

containers-common             noarch  4:1-10.fc33                                     updates        58 k
podman                        x86_64  2:2.1.1-10.fc33                                 fedora         11 M

... the rpm installation fails:

Error: Transaction test error:
  file /usr/share/man/man5/containers-mounts.conf.5.gz conflicts between attempted installs of podman-2:2.1.1-10.fc33.x86_64 and containers-common-4:1-10.fc33.noarch

This can be reproduced easily in a container (not without irony):

podman run -it --rm fedora:33 sh -ec 'sed -i s/nodocs// /etc/dnf/dnf.conf; dnf install -y podman-2:2.1.1-10.fc33'

I reported this in https://bugzilla.redhat.com/show_bug.cgi?id=1900000

But right now I don't see how to work around this -- the test script (which currently manually updates podman and conmon for us) runs too late in this game.

Apr 22 '21 05:04 martinpitt

Nevermind, I re-read the whole discussion here, and pointing out prepare: was an excellent hint! I have a fix now which circumvents the dnf issue in a more robust way.

Apr 22 '21 05:04 martinpitt

Bah, except that this extra prepare: step breaks on the testing-farm

Update: It does not -- that was just coincidence. Testing Farm is generally broken right now.

Apr 22 '21 06:04 martinpitt

In general I'd say that we would save us a lot of possible issues (like the super weird one a ran into yesterday) by running tests on an updated system. Especially in the CI. So I would suggest this:

The testing farm always updates the guest before installing the rpms under test
In tmt we provide a simple way to get an updated guest (optional)

For local experimenting and test debugging I find it ok to run quickly against a bit older distro image but if one does not care about the additional time and wants to be sure that the guest is fresh, there should be a comfortable way.

Apr 29 '21 06:04 psss

The testing farm always updates the guest before installing the rpms under test

@thrix, is this already covered on the Testing Farm side?

In tmt we provide a simple way to get an updated guest (optional)

I guess we still want this. The question is whether it should go to the install prepare plugin or into the new almighty install outlined in #2226.

Apr 01 '25 20:04 psss

The testing farm always updates the guest before installing the rpms under test @thrix, is this already covered on the Testing Farm side?

We do this now for CentOS Stream and Fedora Rawhide. Anyway, as of today the Fedora images will be updated daily:

https://gitlab.com/testing-farm/infrastructure/-/merge_requests/848

So I do believe in Testing Farm we should be fine.

Apr 02 '25 08:04 thrix

Triage meeting summary: Yes, we want to have a standardized way to get an updated system, as this is a common use case. Sounds like a prepare --how feature --update-system enabled. Possible workaround is the use a custom ansible playbook to update the system. Linux system roles could also make sense, but would limit the usage to rpm-like distros.

Apr 02 '25 08:04 psss