wasmtime icon indicating copy to clipboard operation
wasmtime copied to clipboard

Implement loading modules from piped files

Open cr0sh opened this issue 3 years ago • 6 comments

This is not discussed in the issue tracker, but on this Zulip thread.

Previously load_module opened a file on path to check if the module is precompiled, then closes before constructing the Module from it. This is fine on normal files, but on piped files(like /dev/stdin), is problematic as the header read is not visible on the next open.

So, this set of commits inserts a new procedure, before pre-opening to read the header, to detect if the file pointed by the path is a pipe(is_fifo of FileTypeExt), then directly loads the module from the path without checking if the modules is precompiled.

This change is unix-only, and intentionally does not support loading precompiled modules from pipes, because it should not be used that way. An additional warning message is printed in such situations to prevent users from being confused.

The Zulip thread contains an additional concern by me, about implicitly allowing accsesing /dev/tty on - inputs, which is not considered here.

*Edited to describe the current procedure to detect piped files.

cr0sh avatar Oct 09 '22 11:10 cr0sh

It seems like I cannot assign a reviewer for the PR(the template indicates me to do so), so not assigning.

cr0sh avatar Oct 09 '22 11:10 cr0sh

Piping tests now run on unix platforms only(as the - substitiution applies only on unix too). Double buffering issue is resolved. What do you think about implicitly allowing access to /dev/tty on wasmtime - as a special case?

cr0sh avatar Oct 09 '22 14:10 cr0sh

I simplified the strategy to detect piped files(directly invokes is_fifo on Unix) and fixed tests incorrectly set up.

cr0sh avatar Oct 23 '22 14:10 cr0sh

Here's the strategy I'd suggest:

Add a new method to wasmtime::Module that accepts both precompiled binaries and wasm/wat files, and uses mmap. I'd call it from_trusted_file but I'm open to suggestions. It should probably be unsafe, like Module::deserialize_file, for the same reasons.

In the CLI, if allow_precompiled is true, then call Module::from_trusted_file; otherwise, call Module::from_file just like it does today. Don't inspect the bytes of the file at all in the CLI, but put that logic in the wasmtime crate instead. I think that addresses @sunfishcode's comment on keeping the CLI as a simple demonstration of how to use the wasmtime public API, and improves on the current state where there's this magic ELF-header check in the CLI.

In your motivating use case (curl | wasmtime), users shouldn't be passing --allow-precompiled, so the CLI would call Module::from_file, resulting in a single fs::read that consumes the piped input exactly once, which solves the original bug you encountered.

This new method should read the file using MmapVec::from_file. Only the OS knows exactly which file types it's capable of mapping into memory, so this shouldn't check whether the file is a FIFO or anything along those lines; just let the syscall fail if someone uses --allow-precompiled with a pipe.

Once you have an MmapVec, you can check whether it begins with b"\x7fELF". If so, pass it to SerializedModule::from_mmap; otherwise, pass it to Module::new. Neither function copies the bytes it's given, addressing @bjorn3's concern.

I think the only downside to this plan is that the error message from the CLI isn't as friendly if someone tries to run a precompiled artifact without passing --allow-precompiled. We could make Module::build_artifacts return a more helpful error if the bytes it receives start with an ELF header, instead of just reporting that parsing failed.

One advantage to this approach is that wasmtime --allow-precompiled becomes potentially faster for .wat and .wasm files, because it'll use mmap for those cases too.

jameysharp avatar Oct 25 '22 21:10 jameysharp

@jameysharp Thanks for your detailed guidance! I didn't think fixing this would not be hard this way and admittedly I couldn't do some sort of refactoring the project(and even adding a new public API for it) never saw before, tempting myself to bring somewhat hackish way. I'll try to reimplement this way.

I'd call it from_trusted_file but I'm open to suggestions.

This looks good.

I think the only downside to this plan is that the error message from the CLI isn't as friendly if someone tries to run a precompiled artifact without passing --allow-precompiled. We could make Module::build_artifacts return a more helpful error if the bytes it receives start with an ELF header, instead of just reporting that parsing failed.

I'm not sure that I can bring an excellent solution for this; just will do my best.

To recap I leave a tabular summary for your suggestion:

no --allow-precompiled
(Module::from_file)
--allow-precompiled
(Module::from_trusted_file)
precompiled regular: opens OK and fails on some inner procedure but should be rejected via an additional check? TODO: better error handling
fifo: same
regular: opens OK w/ mmap
fifo: mmap fails (intentionally left broken)
.wasm/.wat regular: opens OK
fifo: opens OK (major change on this PR)
regular: opens OK w/mmap
fifo: same

cr0sh avatar Oct 26 '22 14:10 cr0sh

should be rejected via an additional check

Right, the key here is that a file that begins with the 4-byte ELF magic is not valid for .wat or .wasm format and will be rejected by those parsers. None of the characters a .wat file can start with (which I think are whitespace, (, or ;) include \x7f. And the binary .wasm format has its own different 4-byte magic (\0asm).

So from_file already rejects precompiled modules, but it does it by reporting a generic parse failure, which is not super friendly. We can be nicer to users by specifically checking for this case. But if you have any trouble with that, I don't think that has to be in this PR.

The existing check that the CLI does for the ELF magic serves two purposes: one is to provide a helpful error message, and the other is to decide whether to use deserialize_file or from_file. So one way to look at the approach I've outlined is that it does the same things, but in a different order.

I couldn't do some sort of refactoring the project(and even adding a new public API for it) never saw before, tempting myself to bring somewhat hackish way.

Absolutely! I had to stare at the existing code for a long time, and think about a bunch of options that don't work, to come up with this suggestion. So of course I don't expect somebody new to the project to think of it.

I think the project will be better off with this additional API. In my opinion, it would be an improvement even if it didn't help with reading from pipes: the CLI shouldn't need to know about ELF magic. But I also think having support for reading from pipes is a great idea, so I like that this fixes multiple issues at the same time.

jameysharp avatar Oct 26 '22 16:10 jameysharp

Rebased and implemented the @jameysharp 's guidance. (I'll move to a new pull request as the topic has changed, so please leave this PR not reviewed)

cr0sh avatar Nov 29 '22 16:11 cr0sh