coreos-assembler
coreos-assembler copied to clipboard
kola/switch-kernel: rpm-ostree fails to switch from Default to RT Kernel
Bug Report
Environment
What operating system is being used to run coreos-assembler?
Fedora 30
What operating system is being assembled?
RHCOS
Is coreos-assembler running in Podman or Docker?
Podman
If Podman, is coreos-assembler running privileged or unprivileged?
Privileged
Expected Behavior
rpm-ostree command successfully switched kernel from default to rt kernel with command:
rpm-ostree override remove kernel kernel-core kernel-modules kernel-modules-extra --install kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm --install kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm --install kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm
Actual Behavior
+ rpm-ostree override remove kernel kernel-core kernel-modules kernel-modules-extra --install ./kernel-rt/kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm --install ./kernel-rt/kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm --install ./kernel-rt/kernel-rt-modules-m
Checking out tree ffd5b3c... done
Enabled rpm-md repositories: rhel8-baseos rhel8-appstream rhel8-rt
rpm-md repo 'rhel8-baseos' (cached); generated: 2020-02-27T15:31:54Z
rpm-md repo 'rhel8-appstream' (cached); generated: 2020-03-13T13:31:29Z
rpm-md repo 'rhel8-rt' (cached); generated: 2020-02-25T05:36:45Z
Importing rpm-md... done
Resolving dependencies... done
Applying 4 overrides and 4 overlays
Processing packages... done
Running pre scripts... done
Running post scripts... done
Running posttrans scripts... done
Writing rpmdb... done
error: Multiple subdirectories found in: usr/lib/modules
Reproduction Steps
cosa kola switch-kernel -b rhcos --ignition-version v2 --kernel-rt ./kernel-rt- ...
Other Information
Investigated a bit and found https://bugzilla.redhat.com/show_bug.cgi?id=1767215, which seems related. I've tried manually run the above rpm-ostree command inside RHCOS and the same behavior happened. And the origin of the error message is https://github.com/coreos/rpm-ostree/blob/2ee48c51fede72f1f0394c070c0f35946f3e1839/src/libpriv/rpmostree-kernel.c#L141, which only triggers when the directory /usr/lib/modules contains more than one sub-directories. But again,
[core@master-2 ~]$ ll /usr/lib/modules
total 4
drwxr-xr-x. 7 root root 4096 Jan 1 1970 4.18.0-147.el8.x86_64
This error did not occur when https://github.com/coreos/coreos-assembler/pull/1218 got merged. Am I missing anything..?
@jlebon @cgwalters This looks like an rpm-ostree issue at the core...the only thing that jumped out in a search over there was https://github.com/coreos/rpm-ostree/issues/1933
Yup, agreed this is likely an rpm-ostree problem. Will look into this.
Hmm actually I can't reproduce this locally on a fresh RHCOS build. Both running rpm-ostree override remove directly and via cosa kola switch-kernel.
What RHCOS are you testing this on?
Did fresh builds on two different machines, and ran cosa kola switch-kernel inside the cosa container..
Will try again tomorrow morning to see if it works
So I've updated src/config and the error message went away.
Though the rpm-ostree commands are now running without issue, cosa kola switch-kernel will sometimes fail at the second stage (switching RT back to Default) with error message:
Error: failed switch kernel test: failed switching from RT to Default Kernel: failed to run uname -v | grep -qv 'PREEMPT RT': Process exited with status 1
, same as observed in Jenkins pipeline (https://jenkins-rhcos-art.cloud.privileged.psi.redhat.com/job/rhcos-art-rhcos-4.5/76/console).
Since the related error is now gone, should we close this issue?
Hmm yeah that's a different issue. No issues reusing this ticket if you'd prefer. Maybe try to run the same commands manually yourself until you hit the error? The kola SSH wrappers might be swallowing stderr.
This is a pretty old issue. Two things:
- We should delete
kola switch-kerneland make this a regular kola test instead (ideally external). - Another way to switch kernels now is via layering, though that test is currently also broken: https://github.com/openshift/os/issues/1383. Ideally, we need to fix that too since it's only going to be more relevant going forward. That said, it still makes sense to test the client-side
rpm-ostree override replaceflow since that's still what the MCO does today.
The challenge with (1) is that this requires some support on the kola side because we need access to the kernel-rt RPMs. Those RPMs are now shipped as part of the extensions container. We could have a kola test tag like extensions-container which will tell kola to copy in the extensions container into the VM. One tricky bit there is that the extensions container is generated later in the pipeline, so it won't be available on the initial kola run we do. We'd have to add it near the kola testiso run we do instead, which happens after all artifacts are generated.
Another way to switch kernels now is via layering, though that test is currently also broken: https://github.com/openshift/os/issues/1383. Ideally, we need to fix that too since it's only going to be more relevant going forward. That said, it still makes sense to test the client-side rpm-ostree override replace flow since that's still what the MCO does today.
Sorry, this is incorrect. https://github.com/openshift/os/issues/1383 doesn't use the layering flow, but also does it client-side.
The layering test lives in FCOS: https://github.com/coreos/fedora-coreos-config/blob/832c42ba3f406f88647621300aeecde30e9d14ef/tests/kola/rpm-ostree/kernel-replace. So then ideally, we generalize that test so it can work on both FCOS and SCOS/RHCOS.
Let's close this one. The command was removed in https://github.com/coreos/coreos-assembler/pull/3825 in favour of external tests.
Relatedly, @c4rt0 is working on generalizing the existing layering test that we have in f-c-c: https://github.com/coreos/fedora-coreos-config/pull/3048