Use virtiofsd for sharing file system data with host
Switch over to using virtiofsd for sharing file system data with the host. virtiofs is a file system designed for the needs of virtual machines and environments. That is in contrast to 9P fs, which we currently use for sharing data with the host, which is first and foremost a network file system. 9P is problematic if for no other reason that it lacks proper support for usage of the "open-unlink-fstat idiom", in which files are unlinked and later referenced via file descriptor (see #83). virtiofs does not have this problem.
This change replaces usage of 9P with that of virtiofs. In order to work, virtiofs needs a user space server. The current state-of-the-art implementation (virtiofsd) is implemented in Rust and so we interface directly with the library. Most of this code is extracted straight from virtiofsd, as it's a lot of boilerplate. An alternative approach is to install the binary via distribution packages or from crates.io, but availability (and discovery) can be a bit of a challenge. Note that this now means that both libcap-ng as well as libseccomp need to be installed.
I benchmarked both the current master as well as this version with a bare-bones custom kernel: Benchmark 1: target/release/vmtest -k bzImage-9p 'echo test' Time (mean ± σ): 1.316 s ± 0.087 s [User: 0.462 s, System: 1.104 s] Range (min … max): 1.232 s … 1.463 s 10 runs
Benchmark 1: target/release/vmtest -k bzImage-virtiofsd 'echo test' Time (mean ± σ): 1.244 s ± 0.011 s [User: 0.307 s, System: 0.358 s] Range (min … max): 1.227 s … 1.260 s 10 runs
So it seems there is a slight speed up, on average (and significantly less system time being used). This is great, but I suspect a more pronounced speed advantage will be visible when working with large files, in which virtiofs is said to significantly outperform 9P (typically >2x from what I understand, but I have not done any benchmarks of that nature).
A few other notes:
- we solely rely on guest level read-only mounts to enforce read-only state. The virtiofsd recommended way is to use read-only bind mounts [0], but doing so would require root.
- we are not using DAX, because it still is still incomplete and apparently requires building Qemu (?) from source. In any event, it should not change anything functionally and be solely a performance improvement.
I have adjusted the configs, but because I don't have Docker handy I can't really create those kernel. CI seems incapable of producing the artifacts without doing a fully-blown release dance. No idea what empty is about, really. I suspect the test failures we see are because it lacks support?
[0] https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/README.md?ref_type=heads#faq
Can't undraft right now, but this change can probably be looked at. Please advise re: empty kernels & test failures (which I suspect are related to them as well as the images containing incompat kernels).
Found a few more things we could remove -> decreased patch size.
Lemme mess around with it a bit and see if we can get away with just tweaking the CI
Perhaps try setting LIBSECCOMP_LIB_TYPE and LIBCAPNG_LINK_TYPEto static.
Edit: Oops, sorry, I missed that you found those already. Haha.
So the mechanics seem to work, but unfortunately the final binary does not link. I got as far as: https://github.com/danobi/vmtest/tree/release_container
looks like libcap-ng uses functionality that requires dynamic linking:
= note: /usr/bin/ld: /build/target/x86_64-unknown-linux-musl/release/deps/libcapng-516db75093700a65.rlib(cap-ng.o): in function `capng_change_id':
(.text+0x1e07): warning: Using 'initgroups' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/usr/bin/ld: /root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-musl/lib/libstd-1b60b176e45a0318.rlib(std-1b60b176e45a0318.std.d670f529ca0d395b-cgu.0.rcgu.o): in function `<std::sys_common::net::LookupHost as core::convert::TryFrom<(&str,u16)>>::try_from::{{closure}}':
/rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/sys_common/net.rs:214:(.text._ZN104_$LT$std..sys_common..net..LookupHost$u20$as$u20$core..convert..TryFrom$LT$$LP$$RF$str$C$u16$RP$$GT$$GT$8try_from28_$u7b$$u7b$closure$u7d$$u7d$17h3385bde15d7cfc2fE+0x59): warning: Using 'getaddrinfo' in statically linked applications requi
res at runtime the shared libraries from the glibc version used for linking
/usr/bin/ld: /build/target/x86_64-unknown-linux-musl/release/deps/libcapng-516db75093700a65.rlib(cap-ng.o): in function `capng_change_id':
(.text+0x1b9c): warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/usr/bin/ld: /root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-musl/lib/libstd-1b60b176e45a0318.rlib(std-1b60b176e45a0318.std.d670f529ca0d395b-cgu.0.rcgu.o): in function `std::sys::pal::unix::os::home_dir::fallback':
/rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/sys/pal/unix/os.rs:743:(.text._ZN3std3env8home_dir17h6f6faf4d6103e0a7E+0xff): warning: Using 'getpwuid_r' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/usr/bin/ld: /build/target/x86_64-unknown-linux-musl/release/deps/libtoml-56721beccd082414.rlib(toml-56721beccd082414.toml.3e32ff7c492ee995-cgu.00.rcgu.o): undefined reference to symbol '__tls_get_addr@@GLIBC_2.3'
/usr/bin/ld: /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
/usr/bin/ld: /build/target/x86_64-unknown-linux-musl/release/deps/libcapng-516db75093700a65.rlib(cap-ng.o): note: the message above does not take linker garbage collection into account
/usr/bin/ld: /root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-musl/lib/libstd-1b60b176e45a0318.rlib(std-1b60b176e45a0318.std.d670f529ca0d395b-cgu.0.rcgu.o): note: the message above does not take linker garbage collection into account
/usr/bin/ld: /build/target/x86_64-unknown-linux-musl/release/deps/libcapng-516db75093700a65.rlib(cap-ng.o): note: the message above does not take linker garbage collection into account
/usr/bin/ld: /root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-musl/lib/libstd-1b60b176e45a0318.rlib(std-1b60b176e45a0318.std.d670f529ca0d395b-cgu.0.rcgu.o): note: the message above does not take linker garbage collection into account
which makes me very sad.
the above was with musl (which I hoped would work over crt-static) but same issue with crt-static
the static release binaries are very useful and we're depending on it from kernel and bpftrace side.
so I guess our options are to:
- add a feature to virtiofsd disable capng usage
- go with shell-out approach
wdyt?
undefined reference to symbol '__tls_get_addr@@GLIBC_2.3'
The Ubuntu managed static archives likely depend on glibc, because that's their standard libc. I don't think just swapping out the Rust toolchain will suffice here. We may need to use a proper Docker container with Alpine Linux inside or something along those lines. Then it should work.
Let me take a stab at it.
We may need to use a proper Docker container with Alpine Linux inside or something along those lines.
Yeah, so just to add to that: there seems to exist an Alpine package for capng. So if it really was requiring glibc specific stuff I think that would not be possible, because as per my understanding Alpine uses musl. I think the warnings you pasted may just be caused by glibc not liking to be linked statically (there are plenty of articles on that). So I think that conceptually it should be possible to make it work, but I am far from a Docker/cross compile crack. So it may take some time (I'll also be out the next two weeks and may not be able to finish by tomorrow). But it's possible I am missing something.
Ah yeah I think you're right. Lemme give it a shot with alpine later. I can pick this PR up if you're out before it's finished
I think I got it working with an Alpine container:
$ ldd /tmp/vmtest
> statically linked
$ file /tmp/vmtest
/tmp/vmtest: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), static-pie linked, BuildID[sha1]=30ce978ea11eb0c71ea763123fac5852e251f334, not stripped
These are the steps that seem to do it for me:
docker run --network host --tty --interactive rust:alpine
In container:
apk add libcap-ng-static libseccomp-static git clang
git clone https://github.com/d-e-s-o/vmtest.git
cd vmtest
git checkout topic/virtiofs
CC=clang RUSTFLAGS="-C target-feature=+crt-static" LIBSECCOMP_LIB_TYPE=static LIBSECCOMP_LIB_PATH=/usr/lib/ LIBCAPNG_LINK_TYPE=static LIBCAPNG_LIB_PATH=/usr/lib/ cargo build --release --target=x86_64-unknown-linux-musl
Ah yeah I think you're right. Lemme give it a shot with alpine later. I can pick this PR up if you're out before it's finished
Feel free to try it out with the recipe I posted. Also, assuming it works for you as well: if you want to land that that would be great. You are probably the most familiar with the infrastructure you have.
Yeah, just the one thing to do is make sure cross compiling still works
I think I got alpine to install foreign packages after some trial and error. but stuck again on some linker errors. tried a few things (73470a26318495db0d94417be0c830781503fb71, dc31faaddfaab03116faa0c6716b44cf157bfffb). Not much luck. Maybe we need the cross compiling expert @chantra to help
Just a random thought re: https://github.com/danobi/vmtest/commit/73470a26318495db0d94417be0c830781503fb71
Have you tried setting CC to clang? When I played around with Alpine I had issues with crt stuff and they went away after setting that variable. Looked slightly different from what I recall, though.
I think I tried last night but I tried again just now and it didn't help. Got this:
= note: /usr/lib/gcc/x86_64-alpine-linux-musl/13.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: /usr/local/rustup/toolchains/1.81.0-x86_64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crt1.o: Relocations in generic ELF (EM: 183)
/usr/lib/gcc/x86_64-alpine-linux-musl/13.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: /usr/local/rustup/toolchains/1.81.0-x86_64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crt1.o: Relocations in generic ELF (EM: 183)
/usr/lib/gcc/x86_64-alpine-linux-musl/13.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: /usr/local/rustup/toolchains/1.81.0-x86_64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crt1.o: Relocations in generic ELF (EM: 183)
/usr/lib/gcc/x86_64-alpine-linux-musl/13.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: /usr/local/rustup/toolchains/1.81.0-x86_64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crt1.o: Relocations in generic ELF (EM: 183)
/usr/lib/gcc/x86_64-alpine-linux-musl/13.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: /usr/local/rustup/toolchains/1.81.0-x86_64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crt1.o: Relocations in generic ELF (EM: 183)
/usr/lib/gcc/x86_64-alpine-linux-musl/13.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: /usr/local/rustup/toolchains/1.81.0-x86_64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crt1.o: Relocations in generic ELF (EM: 183)
/usr/lib/gcc/x86_64-alpine-linux-musl/13.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: /usr/local/rustup/toolchains/1.81.0-x86_64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crt1.o: Relocations in generic ELF (EM: 183)
/usr/lib/gcc/x86_64-alpine-linux-musl/13.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: /usr/local/rustup/toolchains/1.81.0-x86_64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crt1.o: error adding symbols: file in wrong format
collect2: error: ld returned 1 exit status
Looking closer at the error, it seems x86_64 toolchain is being used on aarch64 deps?
#define EM_AARCH64 183 /* ARM AARCH64 */
Looking closer at the error, it seems x86_64 toolchain is being used on aarch64 deps?
Docker doesn't do instruction set virtualization, right? I think we may need Docker in a VM of the respective architecture to make it work. Basically, if you are running Docker in a aarch64 VM you'd get an aarch64 Alpine Linux image. Though, really, I am still hoping we don't have to do that...
There are actions such as https://github.com/taiki-e/setup-cross-toolchain-action that could help with that. But there I think we may not have sufficient control over the image ourselves to make static linking work. Not sure.
The issue with anything that's GH action based (opposed to something you can run locally) is these cross compiling adventures start taking weeks at a time to make changes. Cuz it's always a massive troubleshooting session and going thru a PR loop takes ages
Yeah. I think I at least understand the issue you are hitting, though. With the replacement of the contents of /etc/arch (or whatever the file name), you are just telling apk that the architecture is <arch>. But it's not. We can't just change the architecture this way. It will just mean that from now on packages are installed for this other architecture, yes, but existing ones won't be replaced or anything like that. Now there may be a universe in which we can make this hack work, where we just keep enough of the base system around while replacing everything else for the target arch and then we can fully build vmtest, but that may only work if everything is statically linked and there are no overlapping dependencies.
In any event, I think the proper way would be to install a proper cross-compilation toolchain. And then we want to use this toolchain when building and can also install and link against the cross-compiled libcap-ng and whatnot and link against them.
I am not familiar with apk, so I don't know off hand how to go about that. But will play around some more. Perhaps they didn't invent new syntax & mechanism for everything and instead just copied something existing that I can piece together...
I was reading last night that alpine doesn't quite support cross compile as well as ubuntu does. Which is how I found the /etc/apk/arch hack. In meantime I shall try emulation route
Emulations works quite well for x86_64 and aarch64: 76795cce1802a6debcebe3e0eaada997d4c5bd33
But rustc doesn't support s390x-unknown-linux-musl at all so I think musl/alpine is out.
Seems like linking approach is out unless you have another idea
Well actually, I see alpine has a package for rust on s390x: https://pkgs.alpinelinux.org/package/v3.20/main/s390x/cargo
Lemme see if that works...
With 323bff0711a321df0 , I'm stuck on
1.609 error[E0463]: can't find crate for `core`
1.609 |
1.609 = note: the `x86_64-unknown-linux-musl` target may not be installed
1.609 = help: consider downloading the target with `rustup target add x86_64-unknown-linux-musl`
1.609
1.615 For more information about this error, try `rustc --explain E0463`.
Obviously x86_64 can be fixed by installing the target using rustup. But s390x they are not published. And seems like alpine does not provide it either.
I'm gonna stop here. If there's no workaround, then we gotta shell out
Dunno about tier 3 targets, but I will say that s390x-unknown-linux-musl is listed as supported here: https://github.com/taiki-e/rust-cross-toolchain#linux-musl
There is a Dockerfile that perhaps contains some clues/could be useful.
If there's no workaround, then we gotta shell out
I don't really understand what that buys us, to be honest. We'd have a statically linked binary, but it now depends on a probably dynamically linked other binary blob to be present somewhere. Is the reasoning that this other blob would be distribution managed and so it's fine? It's certainly not self-contained, which to me is the main selling point in favor of linking statically.
From my perspective, the perhaps last straws to draw would be
- checking if we can live without those two libraries from the
virtiofsdside and making them optional - going back to
glibcand Ubuntu cross-compile toolchains; I may be missing something, but in general one can linkglibcstuff statically; reading again through the messages, we see warnings that may or may not throw us off (but they could be artifacts that get resolved by, dunno, link time optimization passes or whatever), but my interpretation of the error is that it's because of a musl (Rust) / glibc (C) mismatch. If we unify on glibc, I wonder if there would be dynamic dependencies added or not.
I don't really understand what that buys us, to be honest. We'd have a statically linked binary, but it now depends on a probably dynamically linked other binary blob to be present somewhere. Is the reasoning that this other blob would be distribution managed and so it's fine? It's certainly not self-contained, which to me is the main selling point in favor of linking statically.
It's cuz we already require qemu and qemu-ga installed as binaries. And qemu (the package) requires virtiofsd to be installed as a binary too. This is true for all distros I looked at. So virtiofsd binary will probably always be available, except on older distro releases
checking if we can live without those two libraries from the virtiofsd side and making them optional
Yeah, that's one option. Could do a nosecurity feature from their side or something.
I just dunno how much time and effort we wanna invest in getting the glibc thing to work. This stuff ends up being super fragile in the long run. bpftrace had this crazy complex static glibc setup that actually did work pretty well but was impossible to maintain over the years. I killed it all in favor of appimages last year. Could do the same here but felt like overkill given the escape hatches we have
checking if we can live without those two libraries from the virtiofsd side and making them optional
Yeah, that's one option. Could do a
nosecurityfeature from their side or something.
Briefly checked and it seems that seccomp would be entirely unused even by what we have currently. It's really a dependency of their main program and not the library per-se (though it is publicly exposed, but the module could just be optional). capng seems a bit more involved to get rid of. I can look at it once back.
Sweet, sounds good. Let's park it here for now, then
Got it working without these libraries. Updated accordingly.
ldd vmtest/target/release/vmtest
linux-vdso.so.1 (0x00007ffe6dfc0000)
libgcc_s.so.1 => /usr/lib/gcc/x86_64-pc-linux-gnu/13/libgcc_s.so.1 (0x00007f09a2d46000)
libm.so.6 => /usr/lib64/libm.so.6 (0x00007f09a2c9e000)
libc.so.6 => /usr/lib64/libc.so.6 (0x00007f09a2ae2000)
/lib64/ld-linux-x86-64.so.2 (0x00007f09a31f4000)
Upstream pull requests:
- https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/248
- https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/249