vmtest icon indicating copy to clipboard operation
vmtest copied to clipboard

Use virtiofsd for sharing file system data with host

Open d-e-s-o opened this issue 1 year ago • 36 comments

Switch over to using virtiofsd for sharing file system data with the host. virtiofs is a file system designed for the needs of virtual machines and environments. That is in contrast to 9P fs, which we currently use for sharing data with the host, which is first and foremost a network file system. 9P is problematic if for no other reason that it lacks proper support for usage of the "open-unlink-fstat idiom", in which files are unlinked and later referenced via file descriptor (see #83). virtiofs does not have this problem.

This change replaces usage of 9P with that of virtiofs. In order to work, virtiofs needs a user space server. The current state-of-the-art implementation (virtiofsd) is implemented in Rust and so we interface directly with the library. Most of this code is extracted straight from virtiofsd, as it's a lot of boilerplate. An alternative approach is to install the binary via distribution packages or from crates.io, but availability (and discovery) can be a bit of a challenge. Note that this now means that both libcap-ng as well as libseccomp need to be installed.

I benchmarked both the current master as well as this version with a bare-bones custom kernel: Benchmark 1: target/release/vmtest -k bzImage-9p 'echo test' Time (mean ± σ): 1.316 s ± 0.087 s [User: 0.462 s, System: 1.104 s] Range (min … max): 1.232 s … 1.463 s 10 runs

Benchmark 1: target/release/vmtest -k bzImage-virtiofsd 'echo test' Time (mean ± σ): 1.244 s ± 0.011 s [User: 0.307 s, System: 0.358 s] Range (min … max): 1.227 s … 1.260 s 10 runs

So it seems there is a slight speed up, on average (and significantly less system time being used). This is great, but I suspect a more pronounced speed advantage will be visible when working with large files, in which virtiofs is said to significantly outperform 9P (typically >2x from what I understand, but I have not done any benchmarks of that nature).

A few other notes:

  • we solely rely on guest level read-only mounts to enforce read-only state. The virtiofsd recommended way is to use read-only bind mounts [0], but doing so would require root.
  • we are not using DAX, because it still is still incomplete and apparently requires building Qemu (?) from source. In any event, it should not change anything functionally and be solely a performance improvement.

I have adjusted the configs, but because I don't have Docker handy I can't really create those kernel. CI seems incapable of producing the artifacts without doing a fully-blown release dance. No idea what empty is about, really. I suspect the test failures we see are because it lacks support?

[0] https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/README.md?ref_type=heads#faq

d-e-s-o avatar Aug 28 '24 23:08 d-e-s-o

Can't undraft right now, but this change can probably be looked at. Please advise re: empty kernels & test failures (which I suspect are related to them as well as the images containing incompat kernels).

danielocfb avatar Aug 29 '24 18:08 danielocfb

Found a few more things we could remove -> decreased patch size.

d-e-s-o avatar Aug 30 '24 13:08 d-e-s-o

Lemme mess around with it a bit and see if we can get away with just tweaking the CI

Perhaps try setting LIBSECCOMP_LIB_TYPE and LIBCAPNG_LINK_TYPEto static.

Edit: Oops, sorry, I missed that you found those already. Haha.

danielocfb avatar Sep 05 '24 20:09 danielocfb

So the mechanics seem to work, but unfortunately the final binary does not link. I got as far as: https://github.com/danobi/vmtest/tree/release_container

looks like libcap-ng uses functionality that requires dynamic linking:

  = note: /usr/bin/ld: /build/target/x86_64-unknown-linux-musl/release/deps/libcapng-516db75093700a65.rlib(cap-ng.o): in function `capng_change_id':
          (.text+0x1e07): warning: Using 'initgroups' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
          /usr/bin/ld: /root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-musl/lib/libstd-1b60b176e45a0318.rlib(std-1b60b176e45a0318.std.d670f529ca0d395b-cgu.0.rcgu.o): in function `<std::sys_common::net::LookupHost as core::convert::TryFrom<(&str,u16)>>::try_from::{{closure}}':
          /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/sys_common/net.rs:214:(.text._ZN104_$LT$std..sys_common..net..LookupHost$u20$as$u20$core..convert..TryFrom$LT$$LP$$RF$str$C$u16$RP$$GT$$GT$8try_from28_$u7b$$u7b$closure$u7d$$u7d$17h3385bde15d7cfc2fE+0x59): warning: Using 'getaddrinfo' in statically linked applications requi
res at runtime the shared libraries from the glibc version used for linking
          /usr/bin/ld: /build/target/x86_64-unknown-linux-musl/release/deps/libcapng-516db75093700a65.rlib(cap-ng.o): in function `capng_change_id':
          (.text+0x1b9c): warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
          /usr/bin/ld: /root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-musl/lib/libstd-1b60b176e45a0318.rlib(std-1b60b176e45a0318.std.d670f529ca0d395b-cgu.0.rcgu.o): in function `std::sys::pal::unix::os::home_dir::fallback':
          /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/sys/pal/unix/os.rs:743:(.text._ZN3std3env8home_dir17h6f6faf4d6103e0a7E+0xff): warning: Using 'getpwuid_r' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
          /usr/bin/ld: /build/target/x86_64-unknown-linux-musl/release/deps/libtoml-56721beccd082414.rlib(toml-56721beccd082414.toml.3e32ff7c492ee995-cgu.00.rcgu.o): undefined reference to symbol '__tls_get_addr@@GLIBC_2.3'
          /usr/bin/ld: /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2: error adding symbols: DSO missing from command line
          collect2: error: ld returned 1 exit status
          /usr/bin/ld: /build/target/x86_64-unknown-linux-musl/release/deps/libcapng-516db75093700a65.rlib(cap-ng.o): note: the message above does not take linker garbage collection into account
          /usr/bin/ld: /root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-musl/lib/libstd-1b60b176e45a0318.rlib(std-1b60b176e45a0318.std.d670f529ca0d395b-cgu.0.rcgu.o): note: the message above does not take linker garbage collection into account
          /usr/bin/ld: /build/target/x86_64-unknown-linux-musl/release/deps/libcapng-516db75093700a65.rlib(cap-ng.o): note: the message above does not take linker garbage collection into account
          /usr/bin/ld: /root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-musl/lib/libstd-1b60b176e45a0318.rlib(std-1b60b176e45a0318.std.d670f529ca0d395b-cgu.0.rcgu.o): note: the message above does not take linker garbage collection into account

which makes me very sad.

the above was with musl (which I hoped would work over crt-static) but same issue with crt-static

danobi avatar Sep 05 '24 21:09 danobi

the static release binaries are very useful and we're depending on it from kernel and bpftrace side.

so I guess our options are to:

  • add a feature to virtiofsd disable capng usage
  • go with shell-out approach

wdyt?

danobi avatar Sep 05 '24 21:09 danobi

undefined reference to symbol '__tls_get_addr@@GLIBC_2.3'

The Ubuntu managed static archives likely depend on glibc, because that's their standard libc. I don't think just swapping out the Rust toolchain will suffice here. We may need to use a proper Docker container with Alpine Linux inside or something along those lines. Then it should work.

danielocfb avatar Sep 05 '24 21:09 danielocfb

Let me take a stab at it.

danielocfb avatar Sep 05 '24 21:09 danielocfb

We may need to use a proper Docker container with Alpine Linux inside or something along those lines.

Yeah, so just to add to that: there seems to exist an Alpine package for capng. So if it really was requiring glibc specific stuff I think that would not be possible, because as per my understanding Alpine uses musl. I think the warnings you pasted may just be caused by glibc not liking to be linked statically (there are plenty of articles on that). So I think that conceptually it should be possible to make it work, but I am far from a Docker/cross compile crack. So it may take some time (I'll also be out the next two weeks and may not be able to finish by tomorrow). But it's possible I am missing something.

danielocfb avatar Sep 05 '24 22:09 danielocfb

Ah yeah I think you're right. Lemme give it a shot with alpine later. I can pick this PR up if you're out before it's finished

danobi avatar Sep 05 '24 22:09 danobi

I think I got it working with an Alpine container:

$ ldd /tmp/vmtest
>        statically linked
$ file /tmp/vmtest
/tmp/vmtest: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), static-pie linked, BuildID[sha1]=30ce978ea11eb0c71ea763123fac5852e251f334, not stripped

danielocfb avatar Sep 05 '24 22:09 danielocfb

These are the steps that seem to do it for me:

docker run --network host --tty --interactive rust:alpine

In container:

apk add libcap-ng-static libseccomp-static git clang
git clone https://github.com/d-e-s-o/vmtest.git
cd vmtest
git checkout topic/virtiofs
CC=clang RUSTFLAGS="-C target-feature=+crt-static" LIBSECCOMP_LIB_TYPE=static LIBSECCOMP_LIB_PATH=/usr/lib/ LIBCAPNG_LINK_TYPE=static LIBCAPNG_LIB_PATH=/usr/lib/ cargo build --release --target=x86_64-unknown-linux-musl

danielocfb avatar Sep 05 '24 23:09 danielocfb

Ah yeah I think you're right. Lemme give it a shot with alpine later. I can pick this PR up if you're out before it's finished

Feel free to try it out with the recipe I posted. Also, assuming it works for you as well: if you want to land that that would be great. You are probably the most familiar with the infrastructure you have.

danielocfb avatar Sep 05 '24 23:09 danielocfb

Yeah, just the one thing to do is make sure cross compiling still works

danobi avatar Sep 06 '24 01:09 danobi

I think I got alpine to install foreign packages after some trial and error. but stuck again on some linker errors. tried a few things (73470a26318495db0d94417be0c830781503fb71, dc31faaddfaab03116faa0c6716b44cf157bfffb). Not much luck. Maybe we need the cross compiling expert @chantra to help

danobi avatar Sep 06 '24 05:09 danobi

Just a random thought re: https://github.com/danobi/vmtest/commit/73470a26318495db0d94417be0c830781503fb71

Have you tried setting CC to clang? When I played around with Alpine I had issues with crt stuff and they went away after setting that variable. Looked slightly different from what I recall, though.

d-e-s-o avatar Sep 06 '24 12:09 d-e-s-o

I think I tried last night but I tried again just now and it didn't help. Got this:

  = note: /usr/lib/gcc/x86_64-alpine-linux-musl/13.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: /usr/local/rustup/toolchains/1.81.0-x86_64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crt1.o: Relocations in generic ELF (EM: 183)
          /usr/lib/gcc/x86_64-alpine-linux-musl/13.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: /usr/local/rustup/toolchains/1.81.0-x86_64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crt1.o: Relocations in generic ELF (EM: 183)
          /usr/lib/gcc/x86_64-alpine-linux-musl/13.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: /usr/local/rustup/toolchains/1.81.0-x86_64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crt1.o: Relocations in generic ELF (EM: 183)
          /usr/lib/gcc/x86_64-alpine-linux-musl/13.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: /usr/local/rustup/toolchains/1.81.0-x86_64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crt1.o: Relocations in generic ELF (EM: 183)
          /usr/lib/gcc/x86_64-alpine-linux-musl/13.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: /usr/local/rustup/toolchains/1.81.0-x86_64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crt1.o: Relocations in generic ELF (EM: 183)
          /usr/lib/gcc/x86_64-alpine-linux-musl/13.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: /usr/local/rustup/toolchains/1.81.0-x86_64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crt1.o: Relocations in generic ELF (EM: 183)
          /usr/lib/gcc/x86_64-alpine-linux-musl/13.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: /usr/local/rustup/toolchains/1.81.0-x86_64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crt1.o: Relocations in generic ELF (EM: 183)
          /usr/lib/gcc/x86_64-alpine-linux-musl/13.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: /usr/local/rustup/toolchains/1.81.0-x86_64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crt1.o: error adding symbols: file in wrong format
          collect2: error: ld returned 1 exit status

Looking closer at the error, it seems x86_64 toolchain is being used on aarch64 deps?

#define EM_AARCH64	183	/* ARM AARCH64 */

danobi avatar Sep 06 '24 17:09 danobi

Looking closer at the error, it seems x86_64 toolchain is being used on aarch64 deps?

Docker doesn't do instruction set virtualization, right? I think we may need Docker in a VM of the respective architecture to make it work. Basically, if you are running Docker in a aarch64 VM you'd get an aarch64 Alpine Linux image. Though, really, I am still hoping we don't have to do that...

d-e-s-o avatar Sep 06 '24 18:09 d-e-s-o

There are actions such as https://github.com/taiki-e/setup-cross-toolchain-action that could help with that. But there I think we may not have sufficient control over the image ourselves to make static linking work. Not sure.

d-e-s-o avatar Sep 06 '24 18:09 d-e-s-o

The issue with anything that's GH action based (opposed to something you can run locally) is these cross compiling adventures start taking weeks at a time to make changes. Cuz it's always a massive troubleshooting session and going thru a PR loop takes ages

danobi avatar Sep 06 '24 19:09 danobi

Yeah. I think I at least understand the issue you are hitting, though. With the replacement of the contents of /etc/arch (or whatever the file name), you are just telling apk that the architecture is <arch>. But it's not. We can't just change the architecture this way. It will just mean that from now on packages are installed for this other architecture, yes, but existing ones won't be replaced or anything like that. Now there may be a universe in which we can make this hack work, where we just keep enough of the base system around while replacing everything else for the target arch and then we can fully build vmtest, but that may only work if everything is statically linked and there are no overlapping dependencies.

In any event, I think the proper way would be to install a proper cross-compilation toolchain. And then we want to use this toolchain when building and can also install and link against the cross-compiled libcap-ng and whatnot and link against them.

I am not familiar with apk, so I don't know off hand how to go about that. But will play around some more. Perhaps they didn't invent new syntax & mechanism for everything and instead just copied something existing that I can piece together...

d-e-s-o avatar Sep 06 '24 19:09 d-e-s-o

I was reading last night that alpine doesn't quite support cross compile as well as ubuntu does. Which is how I found the /etc/apk/arch hack. In meantime I shall try emulation route

danobi avatar Sep 06 '24 19:09 danobi

Emulations works quite well for x86_64 and aarch64: 76795cce1802a6debcebe3e0eaada997d4c5bd33

But rustc doesn't support s390x-unknown-linux-musl at all so I think musl/alpine is out.

Seems like linking approach is out unless you have another idea

danobi avatar Sep 06 '24 21:09 danobi

Well actually, I see alpine has a package for rust on s390x: https://pkgs.alpinelinux.org/package/v3.20/main/s390x/cargo

Lemme see if that works...

danobi avatar Sep 06 '24 21:09 danobi

With 323bff0711a321df0 , I'm stuck on

1.609 error[E0463]: can't find crate for `core`
1.609   |
1.609   = note: the `x86_64-unknown-linux-musl` target may not be installed
1.609   = help: consider downloading the target with `rustup target add x86_64-unknown-linux-musl`
1.609
1.615 For more information about this error, try `rustc --explain E0463`.

Obviously x86_64 can be fixed by installing the target using rustup. But s390x they are not published. And seems like alpine does not provide it either.

I'm gonna stop here. If there's no workaround, then we gotta shell out

danobi avatar Sep 06 '24 22:09 danobi

Dunno about tier 3 targets, but I will say that s390x-unknown-linux-musl is listed as supported here: https://github.com/taiki-e/rust-cross-toolchain#linux-musl

There is a Dockerfile that perhaps contains some clues/could be useful.

If there's no workaround, then we gotta shell out

I don't really understand what that buys us, to be honest. We'd have a statically linked binary, but it now depends on a probably dynamically linked other binary blob to be present somewhere. Is the reasoning that this other blob would be distribution managed and so it's fine? It's certainly not self-contained, which to me is the main selling point in favor of linking statically.

From my perspective, the perhaps last straws to draw would be

  • checking if we can live without those two libraries from the virtiofsd side and making them optional
  • going back to glibc and Ubuntu cross-compile toolchains; I may be missing something, but in general one can link glibc stuff statically; reading again through the messages, we see warnings that may or may not throw us off (but they could be artifacts that get resolved by, dunno, link time optimization passes or whatever), but my interpretation of the error is that it's because of a musl (Rust) / glibc (C) mismatch. If we unify on glibc, I wonder if there would be dynamic dependencies added or not.

d-e-s-o avatar Sep 06 '24 23:09 d-e-s-o

I don't really understand what that buys us, to be honest. We'd have a statically linked binary, but it now depends on a probably dynamically linked other binary blob to be present somewhere. Is the reasoning that this other blob would be distribution managed and so it's fine? It's certainly not self-contained, which to me is the main selling point in favor of linking statically.

It's cuz we already require qemu and qemu-ga installed as binaries. And qemu (the package) requires virtiofsd to be installed as a binary too. This is true for all distros I looked at. So virtiofsd binary will probably always be available, except on older distro releases

danobi avatar Sep 07 '24 00:09 danobi

checking if we can live without those two libraries from the virtiofsd side and making them optional

Yeah, that's one option. Could do a nosecurity feature from their side or something.

I just dunno how much time and effort we wanna invest in getting the glibc thing to work. This stuff ends up being super fragile in the long run. bpftrace had this crazy complex static glibc setup that actually did work pretty well but was impossible to maintain over the years. I killed it all in favor of appimages last year. Could do the same here but felt like overkill given the escape hatches we have

danobi avatar Sep 07 '24 00:09 danobi

checking if we can live without those two libraries from the virtiofsd side and making them optional

Yeah, that's one option. Could do a nosecurity feature from their side or something.

Briefly checked and it seems that seccomp would be entirely unused even by what we have currently. It's really a dependency of their main program and not the library per-se (though it is publicly exposed, but the module could just be optional). capng seems a bit more involved to get rid of. I can look at it once back.

d-e-s-o avatar Sep 07 '24 00:09 d-e-s-o

Sweet, sounds good. Let's park it here for now, then

danobi avatar Sep 07 '24 03:09 danobi

Got it working without these libraries. Updated accordingly.

ldd vmtest/target/release/vmtest
        linux-vdso.so.1 (0x00007ffe6dfc0000)
        libgcc_s.so.1 => /usr/lib/gcc/x86_64-pc-linux-gnu/13/libgcc_s.so.1 (0x00007f09a2d46000)
        libm.so.6 => /usr/lib64/libm.so.6 (0x00007f09a2c9e000)
        libc.so.6 => /usr/lib64/libc.so.6 (0x00007f09a2ae2000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f09a31f4000)

Upstream pull requests:

  • https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/248
  • https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/249

d-e-s-o avatar Sep 25 '24 21:09 d-e-s-o