osbuild-composer icon indicating copy to clipboard operation
osbuild-composer copied to clipboard

Enable 8.8 and 9.2 test runnners

Open atodorov opened this issue 2 years ago • 5 comments

This pull request includes:

  • [ ] adequate testing for the new functionality or fixed issue
  • [ ] adequate documentation informing people about the change such as
    • [ ] submit a PR for the guides repository if this PR changed any behavior described there: https://www.osbuild.org/guides/

atodorov avatar Nov 14 '22 12:11 atodorov

FTR 9.2 nightly pipeline: https://gitlab.com/redhat/services/products/image-builder/ci/osbuild-composer/-/pipelines/698831011

Upgrade test failed

atodorov avatar Nov 18 '22 15:11 atodorov

For the regular pipeline: ostree failures should be resolved in #3114.

I will look into the failing upgrade test.

atodorov avatar Nov 21 '22 09:11 atodorov

For the regular pipeline: ostree failures should be resolved in #3114.

I'm afraid #3114 only fixes the ram issue - the 9 issue is:

ERROR    loader attribute 'readonly' cannot be specified when firmware autoselection is enabled

which is probably related to a new virt-install version cc @henrywang

runcom avatar Nov 21 '22 09:11 runcom

is this ready? we need it for #3130

runcom avatar Dec 05 '22 09:12 runcom

is this ready? we need it for #3130

No. There is a problem with the 9.1 and 8.7 GA runners, see https://coreos.slack.com/archives/C0235DZB0DT/p1669643319777379

atodorov avatar Dec 05 '22 10:12 atodorov

In ostree-simplified-installer.sh line 895 there is a condition for rhel-9.1 I guess this should be updated to 9.2?

@henrywang can you advice?

runcom avatar Dec 19 '22 11:12 runcom

In ostree-simplified-installer.sh line 895 there is a condition for rhel-9.1 I guess this should be updated to 9.2?

@henrywang can you advice?

It's about the ignition test that I did for #3161, it should be updated to rhel-9.2

7flying avatar Dec 19 '22 11:12 7flying

warning This PR introduces changes in at least one manifest (when comparing PR HEAD dff5103 with the main merge-base 13fdf04). Please review the changes. The changes can be found in the artifacts of the Manifest-diff job [0] as manifests.diff.

[0] https://gitlab.com/redhat/services/products/image-builder/ci/osbuild-composer/-/jobs/3496180624/artifacts/browse

@thozza, @lavocatt, @ondrejbudai I see some removed packages but IDK if that's expected or an issue. Can you take a look?

atodorov avatar Dec 20 '22 09:12 atodorov

warning This PR introduces changes in at least one manifest (when comparing PR HEAD dff5103 with the main merge-base 13fdf04). Please review the changes. The changes can be found in the artifacts of the Manifest-diff job [0] as manifests.diff. [0] https://gitlab.com/redhat/services/products/image-builder/ci/osbuild-composer/-/jobs/3496180624/artifacts/browse

@thozza, @lavocatt, @ondrejbudai I see some removed packages but IDK if that's expected or an issue. Can you take a look?

After I reviewed the changes in this PR, there should be no differences in image manifests. So the diff looks suspicious. I tried to run the gen-manifests tool on main and first I got almost identical diff, but for vmdk image on x86_64 and rhel-8.7. Rerunning the tool again produced no diff. So it seems that there is some issue with depsolving in the tool itself (maybe a race condition when running too many workers). I'm not sure what's happening there and would defer to @achilleas-k.

My suspicion is that re-running the manifest-diff job may actually produce no diff 🤔

So this seems like a general issue, not specific to this PR.

thozza avatar Dec 20 '22 11:12 thozza

I also noticed that some CI jobs are failing on 9.2 images due to image-info not being able to inspect images. @lavocatt is working on it for the manifest-db repo, so we may need to make the same changed as part of this PR to fix the issue. Otherwise we can't merge it, because it would make CI always fail (on 9.2 and c9s).

thozza avatar Dec 20 '22 11:12 thozza

Seeing a lot of errors in the log:

ERROR: Parser error at line:471 col:26
    not well-formed (invalid token)

https://gitlab.com/redhat/services/products/image-builder/ci/osbuild-composer/-/jobs/3496180624

I haven't seen this one before but I've seen dnf print errors that don't raise exceptions and we don't catch through dnf-json.

Maybe we could catch stderr from dnf and look for errors to fail the depsolve job in such cases. We've talked to the dnf team a couple of times about similar things but never made any concrete decisions on it. Might be a good idea to catch these somehow, otherwise we could theoretically get this in prod, build an image off an incomplete manifest, and not realise it.

achilleas-k avatar Dec 20 '22 11:12 achilleas-k

Maybe we could catch stderr from dnf and look for errors to fail the depsolve job in such cases. We've talked to the dnf team a couple of times about similar things but never made any concrete decisions on it. Might be a good idea to catch these somehow, otherwise we could theoretically get this in prod, build an image off an incomplete manifest, and not realise it.

Sounds reasonable to me... Let's discuss that early next year. As you've wrote, if such a thing could happen in production, then would produce incomplete images without knowing about it 🤔

thozza avatar Dec 20 '22 11:12 thozza

8.8 pipeline is PASS: https://gitlab.com/redhat/services/products/image-builder/ci/osbuild-composer/-/pipelines/737840282

9.2 pipeline is FAIL for OSTree raw image test: https://gitlab.com/redhat/services/products/image-builder/ci/osbuild-composer/-/pipelines/737840432

Script '01_update_platforms_check.sh' FAILURE (exit code '2')

I am also seeing failures with various OStree tests outside of nightly pipeline (see statuses on this PR): https://gitlab.com/redhat/services/products/image-builder/ci/osbuild-composer/-/pipelines/737827291:

/usr/libexec/tests/osbuild-composer/ostree.sh: line 289: UPGRADE_PATH: unbound variable

ERROR    internal error: qemu unexpectedly closed the monitor: 2023-01-04T12:16:28.063152Z qemu-kvm: cannot set up guest memory 'pc.ram': Cannot allocate memory

CC @henrywang ^^^

atodorov avatar Jan 04 '23 13:01 atodorov

  1. The connection issue is caused by issue https://github.com/fedora-iot/fido-device-onboard-rs/issues/374. It's fixed already. Just re-run will work now. We have a discussion about this issue on slack thread https://coreos.slack.com/archives/C022TDCV3FH/p1672752884390489
  2. Can we run ostree.sh for ostree nightly test? That'll use less memory. The downstream Edge nightly test does not have this issue.
  • RHEL + CentOS Stream: https://github.com/virt-s1/rhel-edge/projects/1
  • Fedora 37/rawhide: https://github.com/virt-s1/rhel-edge/projects/2

Thanks!

henrywang avatar Jan 09 '23 07:01 henrywang

  1. The connection issue is caused by issue aio error after updating serde_yaml to 0.9 not caught by CI fedora-iot/fido-device-onboard-rs#374. It's fixed already. Just re-run will work now. We have a discussion about this issue on slack thread https://coreos.slack.com/archives/C022TDCV3FH/p1672752884390489

I rebased this PR to the latest main branch and retested today. The results are:

9.2 nightly pipeline - FAIL

OStree raw test, https://gitlab.com/redhat/services/products/image-builder/ci/osbuild-composer/-/jobs/3571891736, fails with:

🗳 Upgrade ostree image/commit[0m
ssh: connect to host 192.168.100.51 port 22: No route to host

Pipeline started from this PR - FAIL, see https://gitlab.com/redhat/services/products/image-builder/ci/osbuild-composer/-/pipelines/741497704

  • Rebase OStree BIOS and Rebase OStree UEFI fail on 8.8;
  • OStree simplified installer fails on 8.8 and 9.2
  • New OStree failed on 8.8, 9.2 job still in progress
  • OStree failed on 8.8, 9.2 job still in progress
2. Can we run `ostree.sh` for ostree nightly test? 

@henrywang do you mean to execute ostree.sh instead of ostree-raw-image.sh for the nightly CI pipelines, iow the ones qualifying internal RHEL builds?

In any case it failed above with ERROR internal error: process exited while connecting to monitor: 2023-01-09T10:44:32.025550Z qemu-kvm: cannot set up guest memory 'pc.ram': Cannot allocate memory so it doesn't look very reliable even if we swap which test script is being executed.

atodorov avatar Jan 09 '23 11:01 atodorov

@atodorov The ostree.sh covers the edge-commit image type, ostree-ng.sh covers edge-container and edge-installer image types, ostree-raw-image.sh covers edge-raw-image image type, and ostree-simplified-installer.sh covers edge-simplified-installer image type. The reason I suggest ostree.sh because it builds a tar ball, and uses less cpu and memory resource compared with container image, ISO, and RAW image.

Issue ssh: connect to host 192.168.100.51 port 22: No route to host should be related with resource issue as well. The VM at this time should not be at running status, might be pause or stopped.

@atodorov @jrusz Do you know the reason why CI VM cannot have more resource on PSI openstack? If there's a limitation for openstack project, can we request two or more projects? Thanks.

henrywang avatar Jan 09 '23 12:01 henrywang

@atodorov @jrusz Do you know the reason why CI VM cannot have more resource on PSI openstack? If there's a limitation for openstack project, can we request two or more projects? Thanks.

The OpenStack cluster is terribly over-subscribed and is maxed out from what I know. IIRC we had our resource quota increased early on but afterwards were denied a second quota increase.

The issue is also partly related to how current resource usage is calculated. For example you can request more RAM but that automatically increases the number of vCPUs used which automatically decreases the number of active VMs you can have at any given time. And that low number in itself will cause test jobs to queue for a long time and then be killed due to inactivity resulting in a snow ball effect.

You can try requesting a second or more projects, I'm skeptical that would be approved. Bear in mind that out CI provisioning however doesn't know how to work with 2 accounts in the same environment. We could probably work around that though.

@henrywang is it possible for Virt QE's CI environment to download RPMs built in osbuild-composer PRs, run tests agains them (same suite that you use for RHEL nightly testing is fine) and report statuses back to the PR ?

Your environment appears to be better suited for testing which requires nested virtualization and if we can send and consume notifications and statuses between this GitHub repository and Virt QE's CI environment we can give it a try.

atodorov avatar Jan 09 '23 13:01 atodorov

Another option might be to onboard these tests to Testing Farm. We already have an MVP for Testing Farm planned for this quarter (see https://issues.redhat.com/browse/COMPOSER-1874, RH internal only, sorry). If it proves to be working well, we may want to start thinking of moving parts of the CI pipeline there.

ondrejbudai avatar Jan 09 '23 13:01 ondrejbudai

GitHub Actions somehow got stuck on this, I had to force-push (same commit, just a new SHA).

ondrejbudai avatar Jan 09 '23 19:01 ondrejbudai

There is just one downside, the ostree tests now take much longer, the simplified-installer even 2 hours... I'll try to get back to having GCP runners as an option after I'm done with current tasks and we could offload the testing there and use bigger machines.

jrusz avatar Jan 10 '23 08:01 jrusz