Provide a simple way to get an updated guest
I am currently investigating a weird test regression of cockpit-podman packit tests on Fedora 33. I can reproduce this locally with tmt run --all provision --how virtual --image fedora-33 in cockpit-podman.
The root cause is that the test dependency installation grabs a very outdated crun package:
package: make, cockpit-ws, cockpit-podman and 6 more
Execute command 'rpm -q --whatprovides "make" "cockpit-ws" "cockpit-podman" "cockpit-system" "libvirt-python3" "npm" "git" "rsyn
c" "python3" || dnf install -y "make" "cockpit-ws" "cockpit-podman" "cockpit-system" "libvirt-python3" "npm" "git" "rsync" "python3"' on gue
st '127.0.0.1:10023'.
Run command 'ssh -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -p 10023 -i /var/tmp/tmt/run-001/plans/all/provision/
default/id_rsa [email protected] rpm -q --whatprovides "make" "cockpit-ws" "cockpit-podman" "cockpit-system" "libvirt-python3" "npm" "git" "rsy
nc" "python3" || dnf install -y "make" "cockpit-ws" "cockpit-podman" "cockpit-system" "libvirt-python3" "npm" "git" "rsync" "python3"'.
environment: None
[...]
out: Installing:
out: cockpit-podman noarch 29-1.fc33 updates 1.0 M
[...]
out: Installing dependencies:
out: crun x86_64 0.15-5.fc33 fedora 156 k
[...]
full /var/tmp/tmt/run-001/log.txt
I can reproduce this with this minimal recipe:
$ tmt run provision --how virtual --image fedora-33
$ tmt run -l login
# inside the testbed
dnf install podman
[...]
Installing:
podman x86_64 2:3.0.1-1.fc33 updates 12 M
Installing dependencies:
conmon x86_64 2:2.0.26-1.fc33 updates 50 k
container-selinux noarch 2:2.145.0-1.fc33 fedora 38 k
containernetworking-plugins x86_64 0.9.1-4.fc33 updates 9.7 M
containers-common noarch 4:1-9.fc33 updates 58 k
crun x86_64 0.15-5.fc33 fedora 156 k
crun 0.15.5 is 7 months old! Curiously, after that, updating works just fine:
# dnf update crun
Fedora 33 - x86_64 - Updates 82 kB/s | 9.6 kB 00:00
Dependencies resolved.
============================================================================================================================================
Package Architecture Version Repository Size
============================================================================================================================================
Upgrading:
crun x86_64 0.18-5.fc33 updates 166 k
[...]
This smells a bit like a dnf bug, but it does not happen on Fedora 33 VMs in our own CI. Does tmt have some particular magic
I sent a workaround in https://github.com/cockpit-project/cockpit-podman/pull/700 , but this is rather dangerous -- it probably means that a lot of tests run against outdated packages?
Reproduced outside of tmt e.g. podman run --rm -ti fedora:33 dnf install podman.
Update:
I am not able to get usable verbose/debug data from dnf :(
However if I update container image first (dnf update -y) then dnf install podman installs newer crun-0.18.
Using dnf install --best podman didn't help for fresh image - it still installs old crun.
FTR I'm using registry.fedoraproject.org/fedora 33 9f2a56037643
Update2:
Seems that I need just to upgrade libcap and then I get new crun.
podman run --rm -ti fedora:33 bash -c 'echo n | dnf install podman' -> crun-0.15-5.fc33
podman run --rm -ti fedora:33 bash -c 'dnf update -y libcap; echo n | dnf install podman' -> crun-0.18-5.fc33
Update3: Filled https://bugzilla.redhat.com/show_bug.cgi?id=1946975 for dnf ignoring --best which might be what we need in dnf install.
Thanks @lukaszachy ! Let's track this in bugzilla on the dnf side then.
@martinpitt @psss @thrix What would you say if tmt does 'dnf update -y' automatically (and add an option to disable that for e.g. CI)? I'm not sure how likely is to hit dnf's bug/feature when run from an updated system, so best place (SUT is as new as possible) would be to update system after require/recommend are installed.
update system after require/recommend are installed
That's also the most expensive option -- the test installs/upgrades packages twice then. If it is enough to update the initial cloud image to work around the bug, that'd be a bit cheaper.
For workaround yes. However we should fix this (as much as we can), right? IMHO running tests on old SUT is wrong and I'm not sure that we have updated images available.
Correct, I think dnf update -y right after booting the instance makes perfect sense. FTR, Debian's/Ubuntu's autopkgtest infra does that as well.
With the latest discussions on the downstream bug, the dnf update -y approach seems correct indeed.
Unfortunately yes. Do I understand dnf's comments correctly that we need to do it after we install require/recommend as 'install' can pick whatever version of dependency to satisfy the transaction?
In that case, dnf picked an old version of crun because an old version of libpcap was already installed. So if the testbed gets updated first, this can't happen any more.
Do you understand why it happened? I couldn't find what makes libcaps update so special that after it suddenly new crun is selected.
@lukaszachy : My understanding was like this:
- old crun 0.15 depends on an older pcap, which is already installed
- current crun 0.18 depends on a newer pcap, which would have to be upgraded first
- dnf prefers current versions over new ones even for transitive dependencies (which I find is a bug..)
- thus, instead of installing current crun 0.18 and updating pcap, it installed an old crun to get along with the already installed old pcap
Having the latest packages installed definitely makes sense. Although when testing using provision --how local one would probably want to skip this by default. Regarding the config I see two options:
- Add an
--update/no-updateoption to thepreparestepinstallplugin - Support a dedicated
preparesection which would be inserted by default
The first one is very easy to implement, the second one would give us some more flexibility. Especially regarding interaction with the CI. This is a bit related to preparation stories in #479 and also to the multiple step config support #68.
With the latter one, do you mean something like(?):
prepare:
how: update
The separate config could/would look something like this:
prepare:
- name: update
how: install
update: true
order: 10
The default low order would mean it would be executed as one of the first prepare plugins.
For the dnf update (or alike) implementation itself, I'd be fine with --skip-broken (IOW, best-effort). I'm not sure about --nobest, which actually completely excludes versions with broken dependencies, which may have some undesired behaviour - like installing/updating to a lower package version(s), which has some runtime issues/bugs (therefore this is not used in Fedora by default, nor in Fedora Koji). Sticking with the tested version (everything that is shipped should be tested that it works), may be the best in that case. Note: I personally rarely had to used --nobest, mostly on broken systems, with already broken dependencies.
There's also dnf upgrade-minimal, the man page says: Updates each package to the latest available version that provides a bugfix, enhancement or a fix for a security issue (security). - so this would probably skip the non-security updates. We should probably do this, even if we decide we don't want to upgrade everything every time.
This just became a whole lot more urgent. Tests now fail in the early testbed setup phase while installing the require: packages from the test plan. As this pulls in such an old podman...
containers-common noarch 4:1-10.fc33 updates 58 k
podman x86_64 2:2.1.1-10.fc33 fedora 11 M
... the rpm installation fails:
Error: Transaction test error:
file /usr/share/man/man5/containers-mounts.conf.5.gz conflicts between attempted installs of podman-2:2.1.1-10.fc33.x86_64 and containers-common-4:1-10.fc33.noarch
This can be reproduced easily in a container (not without irony):
podman run -it --rm fedora:33 sh -ec 'sed -i s/nodocs// /etc/dnf/dnf.conf; dnf install -y podman-2:2.1.1-10.fc33'
I reported this in https://bugzilla.redhat.com/show_bug.cgi?id=1900000
But right now I don't see how to work around this -- the test script (which currently manually updates podman and conmon for us) runs too late in this game.
Nevermind, I re-read the whole discussion here, and pointing out prepare: was an excellent hint! I have a fix now which circumvents the dnf issue in a more robust way.
Bah, except that this extra prepare: step breaks on the testing-farm
Update: It does not -- that was just coincidence. Testing Farm is generally broken right now.
In general I'd say that we would save us a lot of possible issues (like the super weird one a ran into yesterday) by running tests on an updated system. Especially in the CI. So I would suggest this:
- The testing farm always updates the guest before installing the rpms under test
- In
tmtwe provide a simple way to get an updated guest (optional)
For local experimenting and test debugging I find it ok to run quickly against a bit older distro image but if one does not care about the additional time and wants to be sure that the guest is fresh, there should be a comfortable way.
- The testing farm always updates the guest before installing the rpms under test
@thrix, is this already covered on the Testing Farm side?
- In
tmtwe provide a simple way to get an updated guest (optional)
I guess we still want this. The question is whether it should go to the install prepare plugin or into the new almighty install outlined in #2226.
- The testing farm always updates the guest before installing the rpms under test @thrix, is this already covered on the Testing Farm side?
We do this now for CentOS Stream and Fedora Rawhide. Anyway, as of today the Fedora images will be updated daily:
https://gitlab.com/testing-farm/infrastructure/-/merge_requests/848
So I do believe in Testing Farm we should be fine.
Triage meeting summary: Yes, we want to have a standardized way to get an updated system, as this is a common use case. Sounds like a prepare --how feature --update-system enabled. Possible workaround is the use a custom ansible playbook to update the system. Linux system roles could also make sense, but would limit the usage to rpm-like distros.