mkosi Add support for oci-dir output (fixes #1865)

A first draft for rudimentary OCI image outputs

PS: I intend to squash this once more questions are answered/resolved

Feb 07 '24 18:02 septatrix

Should I add something like podman in the CI to test the OCI? A runtime-spec for systemd-nspawn --oci-bundle should also be possible but that would be a bit more complex

Feb 07 '24 21:02 septatrix

I guess. If we are to add this, it should be covered by a test.

Feb 07 '24 21:02 behrmann

Thinking about the tests: Unpacking the OCI bundle and creating the runtime spec is rather annoying and umoci which would be great for this is not package in most places. So while using sd-nspawn --oci-bundle would be neat it is rather annoying and unpacking the OCI and creating the runtime spec on our own would add places where we could hide our own bugs (apart from being annoying).

As a result I think using podman to test this is likely the simplest way for now. However, installing podman in the tools tree also has it problems as the tools tree is inaccessible from within the pytest tests as the proper config parsing etc is only done within mkosi itself. So using the tools tree version would only be possible if we add podman support to the boot/shell verbs. I am not sure if this is desired as this would also likely behave differently regarding stuff like RuntimeTree, network namespaces, etc...

Alternatively we could install podman outside of the tools tree.

Which variant do you prefer?

Feb 08 '24 01:02 septatrix

I have added a basic tests using podman

Feb 08 '24 20:02 septatrix

No clue why CentOS fails and I did not find the motivation to lock through all the logs yet.

It's an endless loop due to journald not starting properly (which is also why the timeout after 1h kicked in. The repeating log section is attached. I have found a few systemd issues which seem to reference what I would guess is the error cause (PR_SET_MM_ARG_START failed...). As this does not happen in the other distros I would assume that this has since been fixed in upstream systemd further making me lean towards skipping the OCI testing for centos.

Log extract

         Starting Journal Service...
systemd-tmpfiles-setup.service: starting held back, waiting for: systemd-journald.service
systemd-journald.service: Failed to add invocation ID to keyring, ignoring: Operation not permitted
Operating on architecture: x86
Operating on architecture: x32
Operating on architecture: x86-64
Operating on architecture: x86
Operating on architecture: x32
Operating on architecture: x86-64
Operating on architecture: x86
Operating on architecture: x32
Operating on architecture: x86-64
Restricting namespace to: n/a.
Operating on architecture: x86
Blocking cgroup.
Blocking ipc.
Blocking net.
Blocking mnt.
Blocking pid.
Blocking user.
Blocking uts.
Blocking time.
Operating on architecture: x32
Blocking cgroup.
Blocking ipc.
Blocking net.
Blocking mnt.
Blocking pid.
Blocking user.
Blocking uts.
Blocking time.
Operating on architecture: x86-64
Blocking cgroup.
Blocking ipc.
Blocking net.
Blocking mnt.
Blocking pid.
Blocking user.
Blocking uts.
Blocking time.
Operating on architecture: x86
Operating on architecture: x32
Operating on architecture: x86-64
Operating on architecture: x86-64
[**    ] (2 of 2) A start job is running for…odes in /dev (6min 41s / no limit)
M
[K[*     ] (2 of 2) A start job is running for…odes in /dev (6min 42s / no limit)
M
[K[**    ] (2 of 2) A start job is running for…odes in /dev (6min 42s / no limit)
M
[K[***   ] (1 of 2) A start job is running for Journal Service (3s / no limit)
M
[K[ ***  ] (1 of 2) A start job is running for Journal Service (4s / no limit)
M
[K[  *** ] (1 of 2) A start job is running for Journal Service (4s / no limit)
M
[K[   ***] (2 of 2) A start job is running for…odes in /dev (6min 44s / no limit)
M
[K[    **] (2 of 2) A start job is running for…odes in /dev (6min 45s / no limit)
M
[K[     *] (2 of 2) A start job is running for…odes in /dev (6min 45s / no limit)
M
[K[    **] (1 of 2) A start job is running for Journal Service (6s / no limit)
M
[K[   ***] (1 of 2) A start job is running for Journal Service (7s / no limit)
M
[K[  *** ] (1 of 2) A start job is running for Journal Service (7s / no limit)
M
[K[ ***  ] (2 of 2) A start job is running for…odes in /dev (6min 47s / no limit)
M
[K[***   ] (2 of 2) A start job is running for…odes in /dev (6min 48s / no limit)
M
[K[**    ] (2 of 2) A start job is running for…odes in /dev (6min 48s / no limit)
M
[K[*     ] (1 of 2) A start job is running for Journal Service (9s / no limit)
M
[K[**    ] (1 of 2) A start job is running for Journal Service (10s / no limit)
Received SIGCHLD from PID 64 (systemd-journal).
Child 64 (systemd-journal) died (code=exited, status=1/FAILURE)
systemd-journald.service: Child 64 belongs to systemd-journald.service.
systemd-journald.service: Main process exited, code=exited, status=1/FAILURE
systemd-journald.service: Failed with result 'exit-code'.
systemd-journald.service: Service will restart (restart setting)
systemd-journald.service: Changed start -> failed
systemd-journald.service: Job 293 systemd-journald.service/start finished, result=failed
M
[K[FAILED] Failed to start Journal Service.
[KSee 'systemctl status systemd-journald.service' for details.
systemd-journald.socket: Changed running -> listening
systemd-journald.service: Unit entered failed state.
systemd-journald.service: Consumed 15ms CPU time.
systemd-journald.service: Changed failed -> auto-restart
systemd-journald.service: Control group is empty.
systemd-journald.service: Service has no hold-off time (RestartSec=0), scheduling restart.
systemd-journald.service: Trying to enqueue job systemd-journald.service/restart/replace
systemd-journald.service: Installed new job systemd-journald.service/restart as 298
systemd-journald.service: Enqueued job systemd-journald.service/restart as 298
systemd-journald.service: Scheduled restart job, restart counter is at 42.
systemd-journald.socket: Incoming traffic
systemd-journald.socket: Changed listening -> running
sysinit.target: starting held back, waiting for: systemd-resolved.service
systemd-journal-flush.service: starting held back, waiting for: systemd-journald.service
systemd-journald.service: Changed auto-restart -> dead
systemd-journald.service: Job 298 systemd-journald.service/restart finished, result=done
[  OK  ] Stopped Journal Service.
systemd-journald.service: Converting job systemd-journald.service/restart -> systemd-journald.service/start
systemd-journald.service: Consumed 15ms CPU time.
sysinit.target: starting held back, waiting for: systemd-resolved.service
systemd-journal-flush.service: starting held back, waiting for: systemd-journald.service
systemd-journald.service: Will spawn child (service_enter_start): /usr/lib/systemd/systemd-journald
systemd-journald.service: Failed to set 'trusted.invocation_id' xattr on control group /system.slice/systemd-journald.service, ignoring: Operation not permitted
systemd-journald.service: Failed to remove 'trusted.delegate' xattr flag on control group /system.slice/systemd-journald.service, ignoring: Operation not permitted
systemd-journald.service: Passing 3 fds to service
systemd-journald.service: About to execute /usr/lib/systemd/systemd-journald
systemd-journald.service: Forked /usr/lib/systemd/systemd-journald as 65
PR_SET_MM_ARG_START failed: Operation not permitted
systemd-journald.service: Changed dead -> start

Feb 09 '24 01:02 septatrix

We're trying to get umoci packaged in Fedora so then it should be OK to depend on it.

Feb 09 '24 10:02 DaanDeMeyer

That would still not solve the problem for the other distros

Feb 09 '24 11:02 septatrix

@septatrix Looking at https://pkgs.org/download/umoci, it seems that only centos/fedora are missing and the rest is covered already. Are there any particular distributions you have in mind?

Feb 09 '24 11:02 DaanDeMeyer

Oh in that case I guess not. I was looking at their readme but I guess that's quite outdated: https://github.com/opencontainers/umoci?tab=readme-ov-file#install

Feb 09 '24 12:02 septatrix

Is there anything else which has to be addressed? Should I try to rebase/squash the commits into more reasonable collections (or maybe even a single, large one)?

The failing test was flaky at the time of my last push. I'll rebase and push which should resolve that

Feb 19 '24 18:02 septatrix

We're trying to get umoci packaged in Fedora so then it should be OK to depend on it.

Link?

Feb 20 '24 09:02 keszybz

We're trying to get umoci packaged in Fedora so then it should be OK to depend on it.

Link?

No link, Michel Lind has been working on it.

I don't think we want to merge this until we figure out whether we can test this with systemd-nspawn instead of podman. I'd much rather get more coverage on systemd-nspawn's OCI support than having to figure out various podman failures when these tests inevitably start failing.

Feb 20 '24 09:02 DaanDeMeyer

I don't think we want to merge this until we figure out whether we can test this with systemd-nspawn instead of podman.

Unless we want to support umoci -> systemd-nspawn for running oci images under a (new) verb inside mkosi itself do not have to install it inside the tools tree. Currently podman is also only installed on the CI runner. As that is using Ubuntu we could already try out umoci there.

I'd much rather get more coverage on systemd-nspawn's OCI support than having to figure out various podman failures when these tests inevitably start failing.

The problem is that unlike podman's systemd detection, umoci only generates a very basic runtime bundle and does not configure mounts/tmpdirs (e.g. cgroups, /run etc) or set environment variables (like $container_uuid) which systemd needs to function. And even all the stuff podman sets up for systemd do not seem to be enough (which is why I skipped some tests, and I assume that is also what is happening in the current Ubuntu tests, see https://github.com/systemd/mkosi/pull/2351#discussion_r1487044987)

We could also decide that we do not care about testing systemd inside the OCI image and instead just care about it being constructed correctly (or simply run /bin/true inside it). In that case something like skopeo inspect would be sufficient to check that we created a valid OCI image.

Feb 20 '24 10:02 septatrix

I don't think we want to merge this until we figure out whether we can test this with systemd-nspawn instead of podman. I'd much rather get more coverage on systemd-nspawn's OCI support than having to figure out various podman failures when these tests inevitably start failing.

Another issue with using umoci would be that it only generates the minimal OCI runtime bundle and has no means to specify e.g. mounts. The means we would have to set up all the stuff required by systemd ourselves, likely by modifying the JSON spec generated by umoci. This means that we introduce a new step where we could hide existing errors or add our own ones

Feb 22 '24 10:02 septatrix

Can we drop the stuff for running OCI images with podman? I really don't think we should add that. I'm OK with the OCI output format though.

Yes I can drop that and also clean up the commit history but I might only get around to it at the end of the month

Mar 21 '24 12:03 septatrix

I rebased the branch and squashed some commit. I hope the split are done in a way which makes review easier.

I also dropped the podman stuff. This means that only building is tested and not running but until we find a better solution (e.g. umoci, sd-nspawn with a manually created OCI bundle, lxc import with oci-template) this should be fine. We can still revisit this at a later time (e.g. when umoci is included everywhere and then include it in the tools tree).

Mar 27 '24 20:03 septatrix

@septatrix This would be a separate PR, but given the "layers" stuff in OCI images, it'd be cool if we could integrate this properly with Overlay=yes and BaseTrees= somehow.

Mar 29 '24 18:03 DaanDeMeyer