Lima driver for Apple Container virtual machines
Description
Currently Lima is using the Virtualization.framework (VZ) to create a VM from a regular cloud image...
But it is also possible to use the Containerization Swift library, to create a VM from a container image
This makes it more similar to the WSL2 driver for Windows, with kernel and init provided outside image*.
The easiest way to integrate from a Go program is to use the container CLI, github.com/apple/container
The main feature is that the MicroVM boots up in a second (with the right image) - instead of in a minute.
The default kernel is currently from the Kata containers project, it could be made available for others too.
Current issues
-
Systemd is not yet supported (not compatible with
vminitd), only OpenRC -
Some advanced features are missing from default kernel, such as IPv6 or VXLAN
-
Apple ARM CPU is required, it does not work on Intel CPU
-
Networking between instances does not work on macOS 15
* for WSL2, the kernel is shared between all system containers. in AC, each one get their own kernel.
Lima could use a custom image for AC, similar to the custom image that is currently used with WSL2.
-
#3622
-
https://github.com/apple/container/discussions/106
Currently the wsl2 example (finch) uses a fedora image:
https://github.com/runfinch/finch-core/blob/main/rootfs/Dockerfile
For ac, the easiest is using a debian image (with openrc):
https://gist.github.com/afbjorklund/99ec0683c82f03e58c22b0d2753b9f50
Note: the @cloud-server-environment group install is 286 rpm packages!
The most important packages are openssh-server and cloud-init (if used)
The default cloud-init implementation also brings a python installation.
Later versions of Fedora also throws in a whole podman installation too...
We could also use Alpine, to make it more similar to alpine-lima cloud image
https://github.com/lima-vm/alpine-lima
Using standard Ubuntu (or Fedora) would have to wait for systemd support
- https://github.com/apple/container/issues/92
Currently Lima only supports tarball images, it could be extended with container images...
WSL2 requires a tar formatted rootfs archive instead of a VM image
We can create a tarball with docker export and we can create a image with docker import.
- https://github.com/apple/container/issues/426
https://learn.microsoft.com/en-us/windows/wsl/use-custom-distro
Normally we would not use a docker daemon, but some OCI library
Supporting both registry and archive is doable as well, of course.
i.e. similar to docker pull and docker load, from matching URLs
The main feature is that the MicroVM boots up in a second (with the right image) - instead of in a minute.
I see much smaller difference:
- running true using podman.lima: 0.1 seconds (vm already running)
- running true in podman (applehv): 0.3 seconds (vm already running)
- running true in apple container: 0.7 seconds
- Starting lima plain vm with vz driver: 13 seconds
In lima we spend about 3 seconds on converting the ubuntu image to raw format on every start. We can eliminate this by converting to raw format during download.
I think using apple container can be interesting but we have to deal with the non-standard init system and other limitations, and current lima performance as docker/podman replacement is already better than apple container (or podman).
It can be more useful to work on better integration:
- making the lima-vm start automatically when needed and stop automatically when not used for long time. Maybe use snapshots to restore the vm quickly to running state instead of booting.
- using minimized kernel that have only when what we need to run containerd for quicker boot
- using minimized bistro like alpine sounds easier to maintain than apple container
Right, I think 15 seconds is more representative for the second boot, when we have installed everything etc.
I was just exaggerating a bit for the first cloud-init run, which is a bit on the slow side with packages and all.
The Apple VM might take a second to start, but it still takes 15 seconds before the Docker daemon is "healthy". So there are many different things at play here. But the ballpark is still, 10x faster to run - 100x slower to boot*.
* but you only boot once per session
The basic image works fine now, only added openssh-server and cloud-init packages.
FROM debian
# openrc (init)
RUN apt-get update && apt-get install -y --no-install-recommends openrc && rm -rf /var/lib/apt/lists/*
RUN rm -f /etc/init.d/cgroups /etc/init.d/hwclock.sh
RUN echo 'rc_need="!sysfs !cgroups !net !mountkernfs !localmount"' >>/etc/rc.conf
ENTRYPOINT ["/sbin/openrc-init"]
# openssh server
RUN apt-get update && apt-get install -y openssh-server && rm -rf /var/lib/apt/lists/*
# cloud-init
RUN apt-get update && apt-get install -y cloud-init && rm -rf /var/lib/apt/lists/*
RUN echo "datasource_list: [ NoCloud ]" >/etc/cloud/cloud.cfg.d/90_dpkg.cfg
RUN mkdir -p /etc/cloud/cloud.cfg.d && cat >/etc/cloud/cloud.cfg.d/cidata.cfg <<EOF
datasource:
NoCloud:
seedfrom: file:///mnt/cidata/
EOF
Haven't looked into how the WSL image is doing the cloud-config*, but this is "working".
container run -d -v $PWD/cidata:/mnt/cidata debian-lima (with user-data in cidata)
Then I just used the ext driver and the guest-install command, to connect it to lima...
We should probably have an lima-guestagent install-openrc to match the systemd?
* EDIT: With WSL, there is a default 9p mount of /:/mnt so it can just read a cidata dir
The "cidata" directory is located in the instance directory, next to cloud-config.yaml
if args.VMType == limayaml.WSL2 {
layout = append(layout, iso9660util.Entry{
Path: "ssh_authorized_keys",
Reader: strings.NewReader(strings.Join(args.SSHPubKeys, "\n")),
})
return writeCIDataDir(filepath.Join(instDir, filenames.CIDataISODir), layout)
}
return iso9660util.Write(filepath.Join(instDir, filenames.CIDataISO), "cidata", layout)
There is nothing stopping those other initiatives, and it would be nice with a custom distribution that used some of the same features (optimized kernel, custom image) for the same type of speed benefits (previously it was mostly about the size).
And I do think that we should have VZ as the default driver on macOS and WSL2 as the default driver on Windows, but it still interesting to have a AC driver on macOS and a Hyper-V driver on Windows if you want to run it "the other way around".
I think using apple container can be interesting but we have to deal with the non-standard init system and other limitations, and current lima performance as docker/podman replacement is already better than apple container (or podman).
Note that we would only use Apple Container to run the fedora "container" VM, we would not use it to run the podman containers (those would run in podman, just as they do with fedora installed from a cloud image in template://podman)
As mentioned above, the reason that I used Debian and (rootful) Docker was because it was a struggle to run the other distributions and rootless containers without systemd. So I borrowed some OpenRC scripts from Alpine*, for Debian.
* this is not ideal, because of the older version of OpenRC that is in debian stable (e.g. log_proxy) and because that alpine makes some distro decisions that might differ from debian. But it still worked "good enough" for a proof-of-concept.
Once Apple Container supports using /lib/systemd/systemd as the entry point (and managing the cgroups), we can go back to running a regular distribution again. It would not really change anything in the driver, only in the image used.
There are some other AC quirks to work out as well, but nothing major...
sudo: unable to resolve host 8b8cc9b0-4757-4e7f-aebf-53edf24ef62d: Name or service not known
=> set up the /etc/hosts file
WARN[0000] fixSystemTimeSkew: error: stat /dev/rtc: no such file or directory
=> sudo ln -s rtc0 /dev/rtc
INFO[0000] [hostagent] Waiting for the essential requirement 2 of 3: "sshfs binary to be installed"
INFO[0000] [hostagent] Waiting for the essential requirement 3 of 3: "fuse to \"allow_other\" as user"
=> same as in the boot scripts
INFO[0000] [hostagent] fuse: failed to open /dev/fuse: Permission denied
=> sudo chmod 666 /dev/fuse
The missing /etc/hosts should be fixed in the next release (month), or so.
-
https://github.com/apple/container/issues/314
-
https://github.com/apple/container/issues/446
-
https://github.com/apple/containerization/issues/251
This is still pre-release (beta) software, and you do run into intermittent faults:
Error: internalError: "failed to start container" (cause: "internalError: "failed to start process (cause: "internal error (13): create managed process: Error Domain=NSCocoaErrorDomain Code=259 "The file isn’t in the correct format."")"")
Error: invalidState: "ExitMonitor already setup for process 910968c2-ed98-48a2-8bca-0828783a83a1
So there is a lot of restarting the system service and of recreating the container.
Note: it is possible to set up DNS, to avoid having to use IP for the container VMs (similar to .local for mDNS):
sudo container system dns create container
container system dns default set container
Now any container with a name can be referenced under that domain, for instance "lima" IP is now at "lima.container"
For the above proof-of-concept I just used SSH DNS and reverse-sshfs, to connect host->guest and guest<-host.
container build -t debian-lima .
container run -d --name lima debian-lima
vmType: ext
arch: aarch64
cpus: 4
memory: 1GiB
disk: 512GiB
mounts:
- location: "~"
- location: "/tmp/lima"
writable: true
# The built-in containerd installer does not support OpenRC currently.
containerd:
system: true
user: false
ssh:
address: lima.container
$ container ls
ID IMAGE OS ARCH STATE ADDR
lima debian-lima:latest linux arm64 running 192.168.105.9
$ limactl ls apple
NAME STATUS SSH VMTYPE ARCH CPUS MEMORY DISK DIR
apple Running lima.container:22 ext aarch64 4 1GiB 512GiB ~/.lima/apple
The actual AC driver could set up a sshLocalPort (-p 60022:22) and virtiofs mounts (-v /tmp/lima:/tmp/lima)
Currently the WSL driver is not using cloud-init, so AC would not have to do it either - but use a similar script:
https://github.com/lima-vm/lima/blob/master/pkg/cidata/cidata.TEMPLATE.d/boot/02-wsl2-setup.sh
LIMA_CIDATA_USER={{ .User }}
LIMA_CIDATA_UID={{ .UID }}
LIMA_CIDATA_COMMENT={{ .Comment }}
LIMA_CIDATA_HOME={{ .Home}}
LIMA_CIDATA_SHELL={{ .Shell }}
The driver still uses the $LIMA_CIDATA_MNT directory for the files, but adds a separate ssh_authorized_keys file.
This is mostly to avoid having to parse YAML, in the bash script. The file has the same lines/keys as the yaml.
users:
- name: "{{.User}}"
uid: "{{.UID}}"
{{- if .Comment }}
gecos: {{ printf "%q" .Comment }}
{{- end }}
homedir: "{{.Home}}"
shell: {{.Shell}}
sudo: ALL=(ALL) NOPASSWD:ALL
lock_passwd: true
ssh-authorized-keys:
{{- range $val := .SSHPubKeys }}
- {{ printf "%q" $val }}
{{- end }}
The rest of the config is probably not needed by either (WSL or AC), so we can probably skip that part anyway...
Mounts in WSL are handled separately, and the virtiofs mounts in AC are mounted automatically by vminitd.
#cloud-config
# vim:syntax=yaml
growpart:
mode: auto
devices: ['/']
{{- if .TimeZone }}
timezone: {{.TimeZone}}
{{- end }}
Since they are using container images/tarballs rather than cloud images, no need to grow the fs partition.
Currently just missing some minor features like user shell or time zone, but those could be fixed if needed...
EDIT:
- #3805
Getting closer to an external Lima driver (using the new gRPC framework), that can talk to the container CLI.
Most of the code is the same as in the WSL2 driver, but there are still some bugs in AC that prevent it from working...
-
https://github.com/apple/container/issues/468
-
https://github.com/apple/container/issues/92
The AC driver can now import a rootfs tarball, by running a quick container build (any duplicates will be cached).
FROM scratch
ADD basedisk /
Then we can download those images just as is done for the WSL2 driver, even though they are built from a Dockerfile:
https://github.com/lima-vm/lima/blob/master/templates/experimental/wsl2.yaml
- docker build
- docker create
- docker export
- docker rm
When using Docker Engine (not Podman Engine), one has to clean up some runtime stuff that are left in the export.
Eventually it could use some other tool, like crane, but providing rootfs images is left as an exercise for the reader.
Since Apple Container only works on macOS (and WSL2 only on Windows), I made a Docker Container driver for Linux.
This way I can test the container code also on the developer machine, and look for opportunities to share some code...
Even though it "works" (with Kata containers), it is not expected that this Lima driver will be actually used for running.
But for testing provisioning scripts and such, it could be actually be useful to run them as containers and not as VMs ?
Note: Kata containers does not work with Docker v28 (due to networking changes) , it requires Docker v27 for now
It is not possible to use reverse-sshfs with a (runc) container, but we can use something like -v /home:/mnt/home
-
https://github.com/lima-vm/lima/discussions/3829
-
https://github.com/lima-vm/lima/pull/3840
Here is the current image base, it can probably be cleaned up further: (and adapted for Docker)
FROM debian
# openrc (init)
RUN apt-get update && apt-get install -y --no-install-recommends openrc && rm -rf /var/lib/apt/lists/*
RUN mkdir -p /etc/conf.d
RUN rm -f /etc/init.d/cgroups /etc/init.d/hwclock.sh
RUN echo 'rc_need="!sysfs !cgroups !net !mountkernfs !localmount"' >>/etc/rc.conf
ENTRYPOINT ["/sbin/openrc-init"]
RUN apt-get update && apt-get install -y --no-install-recommends openssh-server && rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y --no-install-recommends sudo && rm -rf /var/lib/apt/lists/*
# lima
RUN apt-get update && apt-get install -y --no-install-recommends iptables && rm -rf /var/lib/apt/lists/*
RUN update-alternatives --set iptables /usr/sbin/iptables-legacy
RUN apt-get update && apt-get install -y --no-install-recommends sshfs && rm -rf /var/lib/apt/lists/*
RUN echo "user_allow_other" >>/etc/fuse.conf
RUN rm -f /usr/bin/systemctl /lib/systemd/systemd
Cleaning up after docker export: gtar --delete .dockerenv --delete dev --delete proc --delete sys
A simple way to create a tarball is using go-containerregistry: crane export IMAGE|- TARBALL|- [flags]
- #3991