wasmtime icon indicating copy to clipboard operation
wasmtime copied to clipboard

How to get control over filesystem access with `wasmtime_wasi::WasiCtxBuilder`

Open stevenj opened this issue 1 year ago • 10 comments

Our implementation requires filesystems to be fully controlled, and files come from a VFS not any mounted filesystem. That means we need to control the low level aspects of filesystem access for WasiP2.

I don't see how that can be achieved, because the only option I see is giving Wasi access to a pre-opened directory, which is 100% off the table for our application. How to use the WaiCtxBuilder, and abstract the necessary filesystem interfaces to the runtime?

stevenj avatar Jul 16 '24 13:07 stevenj

For a fully virtual filesystem you won't be using WasiCtxBuilder. You'll want to instead implement the Host traits directly generated in the wasmtime-wasi crate such as this. You'd then call add_to_linker or similarly.

alexcrichton avatar Jul 16 '24 14:07 alexcrichton

It looks like there was once a preopened_virt() function that maybe did this? It seems to have been removed though. :-(

It's a bit of a shame you can't use WasiCtxBuilder - I don't want to have to reimplement all the other APIs it provides too. Perhaps WasiCtx should be broken up into components (stdio, networking, filesystem etc?). Kind of feels like the trait system could solve this.

Timmmm avatar Mar 21 '25 10:03 Timmmm

We used to have a more dynamic trait system to make this pluggable when wasmtime-wasi was backed by wasi-common, and when we rewrote wasmtime-wasi to detach it from the legacy wasi-common implementation, we got rid of it because it created significant maintenance burden and complexity for very little benefit over using the linker to do the same.

You can still use the linker to define the filesystem interfaces, and use the rest of wasmtime-wasi's linkers to define the rest of wasi based on WasiCtx, and they will interoperate fine as long as your wasmtime::component::bindgen to generate your filesystem traits has the same with settings for the wasi:io package https://github.com/bytecodealliance/wasmtime/blob/main/crates/wasi/src/bindings.rs#L412-L418

pchickey avatar Mar 21 '25 18:03 pchickey

Ah ok thanks. Sounds a little bit beyond me tbh but I'll have a go. So I wonder get multiple definition linker errors or something like that?

Timmmm avatar Mar 21 '25 20:03 Timmmm

Symbols will be duplicated if you call the wasmtime_wasi::add_to_linker_* functions directly, because those put all symbols into the linker. If you drop down to using the individual wasmtime_wasi::bindings::<package>::<interface>::add_to_linker_get_host (see https://github.com/bytecodealliance/wasmtime/blob/main/crates/wasi/src/lib.rs#L356-L385 for how these are collected into the whole set for add_to_linker_async), you can add symbols to the linker on a per-interface basis.

pchickey avatar Mar 24 '25 16:03 pchickey

Hmm I had a go at this but it gets kind of complicated because of this code, note this comment:

            // Configure all other resources to be concrete types defined in
            // this crate

So I got as far as this:


pub fn add_modified_wasi_apis_to_linker_async<T: WasiView>(linker: &mut Linker<T>) -> anyhow::Result<()> {

    let options = &wasmtime_wasi::p2::bindings::LinkOptions::default();

    let l = linker;
    wasmtime_wasi_io::add_to_linker_async(l)?;

    let closure = type_annotate::<T, _>(|t| WasiImpl(IoImpl(t)));

    wasmtime_wasi::p2::bindings::clocks::wall_clock::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::clocks::monotonic_clock::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::filesystem::types::add_to_linker_get_host(l, closure)?;
    // We use our own instead.
    // wasmtime_wasi::p2::bindings::filesystem::preopens::add_to_linker_get_host(l, closure)?;
    crate::vfs::add_to_linker_get_host(l, closure)?;

    wasmtime_wasi::p2::bindings::random::random::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::random::insecure::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::random::insecure_seed::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::cli::exit::add_to_linker_get_host(l, &options.into(), closure)?;
    wasmtime_wasi::p2::bindings::cli::environment::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::cli::stdin::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::cli::stdout::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::cli::stderr::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::cli::terminal_input::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::cli::terminal_output::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::cli::terminal_stdin::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::cli::terminal_stdout::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::cli::terminal_stderr::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::sockets::tcp::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::sockets::tcp_create_socket::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::sockets::udp::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::sockets::udp_create_socket::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::sockets::instance_network::add_to_linker_get_host(l, closure)?;
    wasmtime_wasi::p2::bindings::sockets::network::add_to_linker_get_host(l, &options.into(), closure)?;
    wasmtime_wasi::p2::bindings::sockets::ip_name_lookup::add_to_linker_get_host(l, closure)?;

    Ok(())
}

which is fine, and then

use wasmtime::component::{Linker, Resource};
use anyhow::Result;
use wasmtime_wasi::{p2::{bindings::filesystem::preopens, WasiCtx}, ResourceTable};

struct MyState {
    table: ResourceTable,
    ctx: WasiCtx,
}

impl preopens::Host for MyState {
    #[doc = " Return the set of preopened directories, and their paths."]
    fn get_directories(&mut self) -> wasmtime::Result<Vec<(Resource<preopens::Descriptor>, String)>> {
        let something = preopens::Descriptor::Dir(todo!());
        Ok(vec![
            (Resource::new_own(todo!()), "/".to_string()),
        ])
    }
}

pub fn add_to_linker_get_host<T, G: for<'a> preopens::GetHost<&'a mut T, T, Host: preopens::Host>>(
    linker: &mut Linker<T>,
    host_getter: G,
) -> Result<()> {
    todo!()
}

I haven't figured out what Resource, Descriptor, Host, GetHost, etc. are yet (really awkward to find the definition of these things due to the heavy use of proc macros)... but anyway the point is that preopens::Host seems to force the use of preopens::Descriptor, which is hard-coded to open files from disk, and that's the behaviour I want to change.

If I want custom FS behaviour do I need to completely replicate the entire mod async_io { wasmtime::component::bindgen!({ call but change one line? That seems... awkward. :-/

Timmmm avatar Jun 08 '25 15:06 Timmmm

I got a bit further... but ultimately you end up having to implement your own

pub fn add_to_linker_get_host<
    T,
    G: for<'a> preopens::GetHost<&'a mut T, T, Host: preopens::Host>,
>(
    linker: &mut Linker<T>,
    host_getter: G,
) -> Result<()> {
    todo!()
}

In this repo that is full of generated code that ultimately comes from these hard-coded types:

mod async_io {
    wasmtime::component::bindgen!({
        path: "src/p2/wit",
        world: "wasi:cli/command",
...
        with: {
            // Configure all other resources to be concrete types defined in
            // this crate
...
            "wasi:filesystem/types/directory-entry-stream": crate::p2::filesystem::ReaddirIterator,
            "wasi:filesystem/types/descriptor": crate::p2::filesystem::Descriptor,
...

What I really want to do is just use a different type there, but it doesn't seem possible. I think probably the easiest solution is just to fork the entire repo, which is probably what I'll end up doing. That's a bit sad though. :-(

Timmmm avatar Jun 15 '25 12:06 Timmmm

In your own crate, can you run bindgen for the wasi:cli/command world and use with to point to all of the parts of wasmtime-wasi and wasmtime-wasi-io that you do want to reuse, and then redefine just the filesystem bits in your crate?

pchickey avatar Jun 16 '25 16:06 pchickey

Ooo yeah maybe, although that does mean I need to copy the exact .wit files into my crate right? I can't refer to the .wit files in another crate?

Timmmm avatar Jun 16 '25 17:06 Timmmm

You may need to copy them if you are in a different repo. In the same repo, you can use the path: argument to bindgen! to provide a folder to look inside.

pchickey avatar Jun 16 '25 17:06 pchickey

Ok I finally got this working (as far as having a noddy ls WASI program run and list some fake files). Demo repo here: https://github.com/Timmmm/wasmtime_fs_demo

It's pretty unsatisfying in a number of ways though:

  1. I had to import the entire wasi-filesystem repo. So I have to manually keep it in sync with the wasi-filesystem .wit files that wasmtime-wasi uses.
  2. I had to copy and paste the entire add_to_linker_async() functions and then comment out the wasi-filesystem stuff. There's no easy way to only add some of it.
  3. I had to copy and paste the bindgen call and then comment everything except wasi-filesystem out, and change the ReadDirIterator/Descriptor types to point to my own versions. This contains a lot of stuff that appears to be undocumented (or at least unobvious and difficult to locate - unfortunately go-to-definition is useless here). E.g. what does only_imports do, or tracing: false?
  4. To work around the orphan rule, wasmtime uses two levels of nested newtypes with several associated traits that are really hard to follow (IoView, WasiView, HasData, IoImpl, WasiImpl, HasWasi... who can follow this?). I think technically if I wanted to provide a generic filesystem virtualisation library I'd need a third level... Fortunately I don't want that, so I just used a concrete type. But still, there must be a better way?
  5. The use of a proc macro to generate the code makes everything harder because rust-analyzer can't show you the code (apparently they're working on it). It took me an unreasonable amount of time just to create the empty trait impls. (Rust-analyzer helps a lot but it doesn't get everything right.)

So I still think this could be improved. I guess it at least works for now though. Hopefully this helps someone.

I'm going to keep working on the wasmtime_fs_demo and try to connect it to gitoxide so it exposes a git commit as a filesystem to WASI.

Timmmm avatar Aug 10 '25 11:08 Timmmm

Ok I updated my demo repo to expose a Git commit as a read-only filesystem using Gix. Not production quality, but a simple tree-style app that tries to infer the file type of every file works!

I found it is quite hard to actually implement some of the WASI functions because there isn't actually a specification for them. You often get "this is similar to <some POSIX function>" which is hardly a specification. Good example: read()'s spec is:

    /// Read from a descriptor, without using and updating the descriptor's offset.
    ///
    /// This function returns a list of bytes containing the data that was
    /// read, along with a bool which, when true, indicates that the end of the
    /// file was reached. The returned list will contain up to `length` bytes; it
    /// may return fewer than requested, if the end of the file is reached or
    /// if the I/O operation is interrupted.
    ///
    /// In the future, this may change to return a `stream<u8, error-code>`.
    ///
    /// Note: This is similar to `pread` in POSIX.
  1. As far as I know descriptors don't have an offset. Nothing else in the API indicates that they have one.
  2. What happens if you read() with an offset past the end of the file? What about just up to the end of the file?
  3. What if you try to read 0 bytes?
  4. What if you try to read 0 bytes but outside the file?

Etc. I'm guessing the answer is going to be "whatever POSIX does", but it only says it's similar to pread so who knows.

Hopefully that will be fixed before WASI 1.0. (Sorry that was a bit off-topic.)

Anyway, it would be good if there was a nicer way to integrate a custom filesystem backend than what I've done in that repo.

Timmmm avatar Aug 16 '25 13:08 Timmmm

In lieu of copying the bindings and doing bindgen! yourself, could you reuse the traits/types from wasmtime_wasi::p2::bindings::filesystem

alexcrichton avatar Aug 18 '25 15:08 alexcrichton

You often get "this is similar to " which is hardly a specification

I'd recommend raising this in the WASI proposals themselves, Wasmtime is not the source of truth for the proposals.

alexcrichton avatar Aug 18 '25 15:08 alexcrichton

could you reuse the traits/types from wasmtime_wasi::p2::bindings::filesystem

Unfortunately not because I need to change these two concrete with: types.

Timmmm avatar Aug 18 '25 15:08 Timmmm

Technically you don't need to override with:, but it's a bit convoluted. IMO it's less convoluted than copy/pasting all the bindings, however, so I'll explain it here. Using the upstream Resource<T> types the T there points to some type in the wasmtime-wasi crate, but you can effectively cast the Resource<T> to any other type, such as Resource<U>, as it's just a "helper" type and not actually related to stored types. This would involve calling the Resource::new_* constructors. You could then wrap that up in your own helper functions/methods/extension traits to lookup your own descriptor type U inside a table with a Resource<T> (or something like that)

alexcrichton avatar Aug 18 '25 15:08 alexcrichton

So for example where I currently have this:

impl wasi_fs::wasi::filesystem::preopens::Host for WasiState {
    fn get_directories(
        &mut self,
    ) -> anyhow::Result<
        Vec<(
            Resource<Descriptor>, // My own `Descriptor` type.
            String,
        )>,
    > {
        Ok(vec![(
            self.resource_table.push(Descriptor{
                kind: EntryKind::Tree,
                id: self.gitfs.root,
            }).with_context(|| format!("failed to push root preopen"))?,
            "/".to_string(),
        )])
    }
}

I would do something like this?

impl wasi_fs::wasi::filesystem::preopens::Host for WasiState {
    fn get_directories(
        &mut self,
    ) -> anyhow::Result<
        Vec<(
            Resource<wasmtime_wasi::filesystem::Descriptor>,
            String,
        )>,
    > {
        let my_resource : Resource<Descriptor> = self.resource_table.push(Descriptor{
                kind: EntryKind::Tree,
                id: self.gitfs.root,
            }).with_context(|| format!("failed to push root preopen"))?;
        let converted_resource : Resource<wasmtime_wasi::filesystem::Descriptor> =
           if my_resource.owned() { Resource::new_owned(my_resource.rep()) } else { Resource::new_borrowed(my_resource.rep()) };
        Ok(vec![(
            converted_resource,
            "/".to_string(),
        )])
    }
}

Timmmm avatar Aug 18 '25 15:08 Timmmm

Indeed yeah, that's functionally what reusing the wasmtime_wasi-based traits would look like. I'd recommend using an extension trait like:

trait ResourceTableExt {
    fn push_my_descriptor(&mut self, my_descriptor: Descriptor) -> Result<Resource<wasmtime_wasi::filesystem::Descriptor>>;
}

impl ResourceTableExt for wasmtime::component::ResourceTable { /* ... */ }

as that would reduce the noise of conversions and such. You could do similarly for accessors too. Whether or not this is more palatable to generating bindings yourself is mostly a subjective point, however.

alexcrichton avatar Aug 18 '25 17:08 alexcrichton

Hmm unfortunately that doesn't quite work. The bindings that wasmtime generates use wasmtime_wasi::p2::filesystem::ReaddirIterator for read_dir() but the wasmtime_wasi::p2::filesystem crate is private (pub(crate)) so I can't actually implement the Host trait.

Can we make that type public so that this approach would be possible?

Timmmm avatar Aug 26 '25 10:08 Timmmm

Oh definitely yeah, making that public is quite reasonable!

alexcrichton avatar Aug 26 '25 17:08 alexcrichton

Cool I will make a PR.

Timmmm avatar Aug 26 '25 17:08 Timmmm

I tried updating but unfortunately these types are now private so I can't implement add_to_linker_except_filesystem()...

use wasmtime_wasi::cli::WasiCli;
use wasmtime_wasi::clocks::WasiClocks;
use wasmtime_wasi::filesystem::WasiFilesystem;
use wasmtime_wasi::random::WasiRandom;
use wasmtime_wasi::sockets::WasiSockets;

Can we make them public too?

Also, this commit has such a great commit message.. Can you teach my colleagues to do that? :-D And thanks for addressing my point above about IoView, WasiView, HasData, IoImpl, WasiImpl, HasWasi... - this is definitely an improvement!

Timmmm avatar Sep 01 '25 08:09 Timmmm

Yeah I've gone back and forth on whether those should be public, but I've copied them outside of this repo before as well. Given that I think it's reasonable to make public, so feel free to make a PR!

alexcrichton avatar Sep 02 '25 16:09 alexcrichton

Woohoo I updated my test repo to use the resource type casting approach and it works! Definitely uglier and much less type safe, but overall I'd say it's the better approach.

Timmmm avatar Sep 02 '25 21:09 Timmmm