switch from 9p to virtiofs
qemu5 is in f33, so once we rebase coreos-assembler to that we can port our usage of 9p to virtiofs which will solve a bunch of problems:
- access to symlinks
- works with RHEL8 (I think, it's planned at least)
We should investigate reusing (ideally) or stealing (less ideal) code from https://github.com/kata-containers/kata-containers/blob/d22c7cf00b30a6791288dbf911627a78872f78ff/src/runtime/virtcontainers/virtiofsd.go
I experimented with this briefly last week (though it wasn't because of this issue). Here's the hack I was using:
diff --git a/src/cmdlib.sh b/src/cmdlib.sh
index 4ad8645ba..1a5eac2ad 100755
--- a/src/cmdlib.sh
+++ b/src/cmdlib.sh
@@ -634,12 +634,19 @@ EOF
kola_args=(kola qemuexec -m "${COSA_SUPERMIN_MEMORY:-${memory_default}}" --auto-cpus -U --workdir none \
--console-to-file "${runvm_console}")
+ sudo /usr/libexec/virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source="${workdir}" &
+ sleep 4
+ sudo chmod 777 /var/run/vm001-vhost-fs.sock
base_qemu_args=(-drive 'if=none,id=root,format=raw,snapshot=on,file='"${vmbuilddir}"'/root,index=1' \
+ -m 4G,maxmem=4G \
-device 'virtio-blk,drive=root'
-kernel "${vmbuilddir}/kernel" -initrd "${vmbuilddir}/initrd" \
-no-reboot -nodefaults \
-device virtio-serial \
- -virtfs 'local,id=workdir,path='"${workdir}"',security_model=none,mount_tag=workdir' \
+ -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \
+ -device vhost-user-fs-pci,chardev=char0,tag=workdir \
+ -object memory-backend-memfd,id=mem,size=4G,share=on \
+ -numa node,memdev=mem \
-append "root=/dev/vda console=${DEFAULT_TERMINAL} selinux=1 enforcing=0 autorelabel=1" \
)
diff --git a/src/supermin-init-prelude.sh b/src/supermin-init-prelude.sh
index d8f7b3106..888404f99 100644
--- a/src/supermin-init-prelude.sh
+++ b/src/supermin-init-prelude.sh
@@ -15,7 +15,7 @@ mount -t tmpfs tmpfs /dev/shm
LANG=C /sbin/load_policy -i
# load kernel module for 9pnet_virtio for 9pfs mount
-/sbin/modprobe 9pnet_virtio
+/sbin/modprobe virtiofs
# need fuse module for rofiles-fuse/bwrap during post scripts run
/sbin/modprobe fuse
@@ -33,10 +33,8 @@ fi
umask 002
# set up workdir
-# For 9p mounts set msize to 100MiB
-# https://github.com/coreos/coreos-assembler/issues/2171
mkdir -p "${workdir:?}"
-mount -t 9p -o rw,trans=virtio,version=9p2000.L,msize=10485760 workdir "${workdir}"
+mount -t virtiofs workdir "${workdir}"
# These two invocations pair with virtfs setups for qemu in cmdlib.sh. Keep them in sync.
if [ -L "${workdir}"/src/config ]; then
mkdir -p "$(readlink "${workdir}"/src/config)"
This was mostly lifted from the man page.
I think the biggest problem I see right now (at least from reading the man page and experience) is that it requires root (sudo). That's the only benefit of staying on 9p that I can see.
I didn't take the experiment further to see if I could get rid of the cache.qcow2 altogether, but I suspect maybe we can.
I think the biggest problem I see right now (at least from reading the man page and experience) is that it requires root (sudo).
That may be true for virtiofsd today, but I don't think it's true for virtiofs in general.
That may be true for virtiofsd today, but I don't think it's true for virtiofs in general.
Is there any other implementation that doesn't have that limitation or maybe an RFE for virtiofsd to support non-root?
https://lore.kernel.org/all/[email protected]/T/
Wow. Thanks for starting that discussion and linking to it here. It sounds like there is a short term path and a longer term path (a bit more work but more complete).
I'm excited for when it lands.
cc https://lore.kernel.org/qemu-devel/[email protected]/T/#u
After asking around I was advised that my initial email should have gone to the virtiofs list. We had a big round of discussion and the result seems to be a commitment from that team to add openat2 support that should work for us.
https://lore.kernel.org/qemu-devel/[email protected]/T/#mf51ede15eb476e37f2aa5352df727fdcd5f702c6
Nice.. Thanks for carrying forward this discussion! I'm guessing the already existing unprivileged rust virtiofsd that they mention in the mailing list thread doesn't work for us (yet) because of user namespaces in our build env?
This is in progress over in https://github.com/coreos/coreos-assembler/pull/3428 just blocked on an upstream PR to be put in a release.
Looks like a new release of virtiofsd is in testing: https://bodhi.fedoraproject.org/updates/FEDORA-2023-5167ce8181