bootc tests cleanup/improvements

Our testing story is of course scattered. It's time to improve that. Right now we have important tests being run via github actions bridging to testing farm, which itself is mainly an executor for tmt.

Then there's some github actions usage, plus some tests maintained in the rust sources.

(We also have rust unit tests, but not much can be unit tested in an interesting way here)

The most important thing here is:

Create a streamlined workflow for "build bootc and run key tests locally"

And we have a few avenues to pursue for this. I'd like to make this viable with tmt. We have the hack/Containerfile which builds a bootc-based container image. I'd like to streamline a flow where tmt can do something like run podman-bootc automatically and make a disk image, and then run tests against it with the "classic" model of puppeting over ssh.

May 16 '24 21:05 cgwalters

It's probably self-evident but I will throw it out anyway. I think it would be great too if the testing can be equally easy run locally as it is in the GH runners/testingfarm so that the barrier for new contributors is low. So being able to e.g. run in qemu too, ideally via a single command (once the right test dependencies are installed).

May 17 '24 07:05 mvo5

@henrywang says that packit is the main flow here that builds RPMs and has code to inject them into the test environment. But surely other projects have a flow for this?

We have the hack/Containerfile which builds a bootc-based container image. I'd like to streamline a flow where tmt can do something like run podman-bootc automatically and make a disk image, and then run tests against it with the "classic" model of puppeting over ssh.

Related to this, today the existing tmt code basically implements its own provisioning; it can talk to AWS or libvirt. But tmt's provisioner can do that too! As can podman-bootc.

So in theory I could run tmt --how local but it feels quite expensive; among other things it'll rebuild an RPM from scratch each time, also redownload a c9s qcow2. Whereas with tmt at least the qcow2 is cached.

The containerfile we use for development has incremental builds set up which is really useful. I wouldn't be opposed to giving that up in the short term to gain better tmt integration though.

Another related thing here is I'd like to try to move some of the implementation of testing to one where the guest is driving it, not being driven by the host. This is how we do things in coreos-assembler/kola - the host just polls a systemd unit. Notably, there's good support for having the test do reboots because we implement the debian autopkgtest https://manpages.debian.org/bookworm/autopkgtest/autopkgtest.1.en.html API to allow the testbed to reboot. This is super useful for bootc.

May 31 '24 00:05 cgwalters

For example, tmt --root . -c arch=x86_64 run --all --verbose -e TEST_OS=fedora-40 -e ARCH=x86_64 -e QUAY_USERNAME=abc -e QUAY_PASSWORD="foobar" -e QUAY_SECRET="quayiouserpassbase64" provision --how local plans --name install-upgrade/to-disk command will run all test in local machine, no addition vm deployment and provision needed.

In this case building rpm, bootc install-to-disk, runing disk image with virt-install are all run in local machine. All test required package will be install into local machine by tmt plan.

May 31 '24 02:05 henrywang

Packit can be one of solutions for bootc testing. There're some benefits we can get from Packit solution:

It does not need github runner, that can save github runner
It can build bootc RPM package with copr build and only a spec file need
No Testing Farm API Token configured in Packit
Some of bootc tests can be moved to TMT to avoid github runner.

Downsides:

Packit does not support secret. I think we can drop bootc install-to-existing-root test on AWS instance, but use libvirt vm instead. That avoid AWS secrets configured. For quay.io secrets required for pushing, localhost can be used for upgrade test, local folder (/mnt for example) can be used for switch test.
No RHEL RPM build and no RHEL environment.

May 31 '24 03:05 henrywang

For example, tmt --root . -c arch=x86_64 run --all --verbose -e TEST_OS=fedora-40 -e ARCH=x86_64 -e QUAY_USERNAME=abc -e QUAY_PASSWORD="foobar" -e QUAY_SECRET="quayiouserpassbase64" provision --how local plans --name install-upgrade/to-disk command will run all test in local machine, no addition vm deployment and provision needed.

The QUAY_USERNAME here strongly shows that's not local; we're relying on an external infrastructure. I think we actually do need to fix that at some point even in general because isn't it racy today to have two concurrent test runs, both pushing staging images to quay?

This is a super complex topic of course...how to handle registries in a generic way especially that works locally is messy. One really nice thing about Prow (as used by Kube) is that each CI job runs by default in its own Kubernetes namespace and gets the free ability to push to the internal registry in a nicely scoped/safe way. We don't have that by default with GHA or Testing Farm.

We should probably make the registry usage in the tmt tests parameterized, so one can use ghcr.io or whatever too, and probably default to auto-synthesizing tags. Also btw when pushing to quay.io one can use tag expiration to get reliable GC of images.

Anyways, it's quite important of course to have coverage with "real" registry pushes, but I think we can do a lot of testing in a fully hermetic/local way with a VMs that don't even have networking, or at least not Internet routing (we do this in coreos-assembler/kola; see e.g. https://github.com/coreos/coreos-assembler/blob/main/docs/kola/external-tests.md#kolajson which includes the needs-internet tag which was taken from Debian autopkgtest. tmt definitely needs this too.

In the short term...man, I really want to reuse TMT (or more generally, something someone else has written and maintained) but...argh.

I'm still looking at this problem domain; in the short term I did https://github.com/containers/bootc/pull/576 to just clean up that whole framework, but it doesn't cover the "local hermetic VM testing" that I'm getting at here.

May 31 '24 21:05 cgwalters

I'm still looking at this problem domain; in the short term I did https://github.com/containers/bootc/pull/576 to just clean up that whole framework, but it doesn't cover the "local hermetic VM testing" that I'm getting at here.

I'm working on test update to add localhost scenario (no push to registry required) with an argv controlled.

Jun 01 '24 04:06 henrywang

OK https://github.com/containers/bootc/pull/590 pushes things forward here a bit, getting to the point where we have a container image that can be turned into a disk image, that works directly with a simple tmt test.

However, the workflow of "build a disk image to pass to tmt" needs automating. I still don't largely understand how the expected workflow of "build code locally to pass to tmt" is expected to generally work. The way e.g. bib builds the container as part of the tests feels...wrong.

Jun 09 '24 20:06 cgwalters

Chatting with @martinpitt we determined it would probably make sense to have packit (as the "glue") support something like building a container image with the code into temporary registry (it could of course use the COPR or whatever it provisions and install into the container)
A good toplevel goal would be generic support sufficient to e2e verify https://github.com/containers/bootc/issues/571 in cockpit's CI, but also other projects (e.g. freeipa) could do very similar things

Jun 20 '24 12:06 cgwalters

Note, I'm @martinpitt; @pitti exists, but is someone else :grin:

Another approach which we discussed would be to design this "human/local first" (always good): build an OCI container locally with the modified code, turn that into a qcow, and boot that with qemu (or virt-install if you prefer). This is what you'd want to run tests locally without unnecessary faff like AWS credentials or large uploads/downloads.

However, that's hard to deploy to GitHub workflows and standard TF machines, but: @thrix mentioned last week that they offer /dev/kvm support in TF machines now, and even real-iron (I don't have details).

Another option is Cirrus CI, they also offer /dev/kvm. Our starter-kit project uses this, mostly as a demo -- but it does boot our standard cockpit bots VM images and runs stuff in them just fine. TF is obviously preferable both in terms of "use our standard tools" and also rate limiting, but maybe it's useful for something.

Jun 20 '24 12:06 martinpitt

https://github.com/teemtee/tmt/pull/3037 makes running tmt locally for me actually bearable. I was really surprised to run out of disk space on this workstation with 2T of local storage. Also auditing this stack, I did https://pagure.io/testcloud/pull-request/174

Jun 21 '24 12:06 cgwalters

According to @cgwalters and @martinpitt (Hi, long time no see 😊) chatting, we can move integration test (shell + ansible) part to packit but with some changes:

rename this test to end to end test, e2e is more reasonable in this case.
run all e2e tests locally by dropping aws and quay.io dependence, but use vm and local registry instead. That avoids secrets required.
e2e tests are for bootc install to-existing-root, bootc install to-disk, bootc upgrade/switch command test and their args tests.

What do you think @cgwalters? Thanks.

Jun 21 '24 14:06 henrywang

That sounds pretty good to me!

Jun 21 '24 18:06 cgwalters