Extend standalone support
- Implement _emscripten_throw_longjmp that aborts the program
- Add nice messages to __cxa_throw and __cxa_allocate_exception
- Implement getentropy using WASI
- Implement WASI FS for syscalls lstat64, stat64, newfstatat, getdents64, fstat64 and openat
This is far from complete if you look at the WASI FS spec and the syscalls, but at least it gets basic operations like directory listing, file/directory opening and file/directory stat working. It was enough for me to get pdfium working in standalone mode.
I didn't really know what I should do with AT_FDCWD or what the runtime should do with it when passing AT_FDCWD. AT_FDCWD is also negative while I think fd in WASI is unsigned?
Most of the syscall code was taken from or based on wasi-libc.
Can we add some some new tests based on these?
Can we add some some new tests based on these?
I have never developed in C before, I don't really know where to start, can you give me some hints where you want these tests added and what they should test?
on behalf of wasm people I know I would like to thank @jerbob92 and @sbc100 for progressing this, as it not only unlocks things like pdfium, but also generic wasm utilities that use emscripten (like wabt). Ex being able to run wat2wasm using any wasi runtime instead of whatever was released and packaged.
If any one not currently involved has skills to contribute, please do as this is very likely well into "hobby time" for @jerbob92 and it is at the core a pretty substantial infrastructure change and a lot of work to test it. other wasm people will thank you, but I'll thank in advance!
on behalf of wasm people I know I would like to thank @jerbob92 and @sbc100 for progressing this, as it not only unlocks things like pdfium, but also generic wasm utilities that use emscripten (like wabt). Ex being able to run wat2wasm using any wasi runtime instead of whatever was released and packaged.
If you just wast a wasi version of the wabt tools I think the simplest path would be to use wasi-sdk to build it. Of course that doesn't mean we shouldn't land this change too.
@sbc100 ps your last comment didn't format right. might want to polish it
If you just wast a wasi version of the wabt tools I think the simplest path would be to use wasi-sdk to build it. Of course that doesn't mean we shouldn't land this change too.
good idea, on the wabt thing and indeed a separate topic if they are open to decoupling from emscripten.
If you just wast a wasi version of the wabt tools I think the simplest path would be to use wasi-sdk to build it. Of course that doesn't mean we shouldn't land this change too.
good idea, on the wabt thing and indeed a separate topic if they are open to decoupling from emscripten.
IIUC wabt is not coupled to emscripten, we just happen to build it with emscripten for the web demo.
(I guess maybe what you are saying is that it would be nice if the web version of wabt, which is run by emscripten, happened to also be WASI compliant.. in that case I agree that would be very cool).
yeah sorry it was that we were looking for a wabt compiled to wasm and found the one that was compiled with emscripten for the web. I raised this issue for out of browser wasi binary https://github.com/WebAssembly/wabt/issues/2101
Thanks @codefromthecrypt! So in hindsight, I don't think my C skills are good enough to get this PR into a merge-able state, implementing AT_FDCWD, using the preload dirs and not the FD 3, implementing tests.
@jerbob92 no worries, I think you got things very far. I'll keep recruiting to whatever end on this.
@sbc100 I have done some more progress on this PR now:
- Implemented WASI pre-opens
- Implement mkdirat
I do have some questions though:
- How to get started on adding tests for this?
- The standalone.c file is getting very big right now, how do we want to split this up? Perhaps something like standalone_wasi_preopens.c, standalone_wasi_fs.c, standalone_wasi_random.c go keep things clean?
- Is there an easy way to check which syscalls still need to be implemented in standalone to make it full WASI compliant?
- Code is mostly copied from wasi-libc right now with some changes, is this something that we want?
- wasi-libc has a whole system to maintain a CWD, is this also something that we want to implement?
@sbc100 I have done some more progress on this PR now:
- Implemented WASI pre-opens
- Implement mkdirat
I do have some questions though:
- How to get started on adding tests for this?
- The standalone.c file is getting very big right now, how do we want to split this up? Perhaps something like standalone_wasi_preopens.c, standalone_wasi_fs.c, standalone_wasi_random.c go keep things clean?
I don't feel strongly about this. Perhaps use the same split that wasi-libc uses? Also I don't think we need standalone in each filename since they will all be in the standalone directory.
- Is there an easy way to check which syscalls still need to be implemented in standalone to make it full WASI compliant?
The ideas is that we will be running the full wasi testsuite. I started adding some of it in #12704 and I have some more plans.
- Code is mostly copied from wasi-libc right now with some changes, is this something that we want?
I don't think its a problem, but please document the origin and try to document any changes from upstream.
- wasi-libc has a whole system to maintain a CWD, is this also something that we want to implement?
Sure. We could even consider adding wasi-libc as a submodule and including certain files directly?
I don't feel strongly about this. Perhaps use the same split that wasi-libc uses? Also I don't think we need standalone in each filename since they will all be in the standalone directory.
Sounds good!
The ideas is that we will be running the full wasi testsuite. I started adding some of it in https://github.com/emscripten-core/emscripten/pull/12704 and I have some more plans.
Nice, that will make it a lot easier!
I don't think its a problem, but please document the origin and try to document any changes from upstream. Sure. We could even consider adding wasi-libc as a submodule and including certain files directly?
If that's a possibility that would be great, it would make it a lot easier because right now I'm just cherry-picking wasi-libc code to get the syscalls/WASI calls working that I require, especially if we want to pass the whole wasi-testsuite then we have to copy a lot from was-libc. Only thing I noticed while copying code is that their WASI function signature is a bit different sometimes, which prevents us from directly using the C code (AFAIK), so we might want to look into making that possible. For example:
wasi-libc __wasi_path_filestat_get = __wasi_errno_t __wasi_path_filestat_get(__wasi_fd_t fd, __wasi_lookupflags_t flags, const char *path, __wasi_filestat_t *retptr0)
Emscripten __wasi_path_filestat_get = __wasi_errno_t __wasi_path_filestat_get(__wasi_fd_t fd, __wasi_lookupflags_t flags, const char *path, size_t path_len, __wasi_filestat_t *buf)
So Emscripten requires the path length and wasi-libc doesn't. But perhaps we can work around this by completely using wasi-libc for the wasi part.
A stupid nit - if it was based on wasi libc, shouldn't there be copyright attribution somewhere? (Or is that not necessary?)
A stupid nit - if it was based on wasi libc, shouldn't there be copyright attribution somewhere? (Or is that not necessary?)
No stupid at all, we should certainly consider how to handle this. I'm not sure why the right answer is. Or how we should deal with keeping the codebases in sync. One easy option wold be to add wasi-libc as a submodule instead of duplicating the code here.
A stupid nit - if it was based on wasi libc, shouldn't there be copyright attribution somewhere? (Or is that not necessary?)
We should keep the copyright and license notices from wasi-libc files we use. The license is compatible with ours (MIT + other options) but it's still always important to keep those notices AFAIK. If wasi-libc doesn't have a notice in each file, we can just copy their main LICENSE file and add mentions in the relevant code that refers to it.
If possible it might be simpler as @sbc100 said to get all of wasi-libc as a submodule (which would include their LICENSE file of course).
Is there still the idea to get this merged at some point and maintain two ABIs?
Is there still the idea to get this merged at some point and maintain two ABIs?
It hasn't been big priority recently since folks who want standalone support have generally been choosing to use wasi-sdk. Can you describe your use case for wanting better standalone support in emscripten over using wasi-sdk? It might help motivate the direction here.
@jerbob92 will probably repond, but for me - wasi-sdk is hard to use, because emscripten already has ready ports of stuff like freetype, zlib, libpng, etc.
with @jerbob92 we were building ghostscript and pdftops to WASI, it's "easy" with emscripten (easy in big quotes) but I don't know how would I even start with wasi-sdk with all the freetype libs etc.
Yeah it's basically what @karelbilek said. Most existing applications/libraries I can just compile with Emscripten, which I can't with wasi-sdk, at least not with a lot of effort.
Yeah it's basically what @karelbilek said. Most existing applications/libraries I can just compile with Emscripten, which I can't with wasi-sdk, at least not with a lot of effort.
Oh I didn't realize that. I would expect most stuff to just compile with wasi-sdk.
Can you describe your use case for wanting better standalone support in emscripten over using wasi-sdk?
I think the primary use case would maybe to have some hybrid target where you can target WASI where possible and otherwise fallback to emscripten implementations to also support APIs which aren't supported by WASI. I don't know if that's entirely out of scope and if it would be an "either or" but at least that's what I had in mind.
I would expect most stuff to just compile with wasi-sdk.
Yea that is unfortunately not the case 😭
Emscripten just includes a lot of batteries and it'd be really nice to use WASI for example for fs syscalls because it's more efficient than going through the Node.js FS layer and then for other APIs use Emscripten implementations.
I would expect most stuff to just compile with wasi-sdk.
Yea that is unfortunately not the case 😭
Emscripten just includes a lot of batteries and it'd be really nice to use WASI for example for fs syscalls because it's more efficient than going through the Node.js FS layer and then for other APIs use Emscripten implementations.
Actually one of the reasons we have not pushed hard on the use of WASI FS APIs is that we found they couldn't match the emscripten custom FS APIs on performance and code size in all cases.
I am totally in favor of trying to use WASI APIs and improve standalone mode, but it won't be in the name of more efficiency, if anything it would likely decrease efficiency.
they couldn't match the emscripten custom FS APIs on performance and code size in all cases
Oh wow, that is very interesting. I would have expected that the WASI FS APIs were more performant than going through Node's JS layer for FS calls.
Do you have an idea why that is? Where does the performance come from with Emscripten's custom FS APIs? Isn't that FS API just going through Node's fs layer?
they couldn't match the emscripten custom FS APIs on performance and code size in all cases
Oh wow, that is very interesting. I would have expected that the WASI FS APIs were more performant than going through Node's JS layer for FS calls.
I'm not sure what you mean here by WASI FS APIs. Are you talking about a particular implementation? In emscripten the WASI APIs are implemented in JS just like all the other APIs we have: https://github.com/emscripten-core/emscripten/blob/main/src/library_wasi.js.
Perhaps you are referring to some kind of native WASI API that exists on native wasm VMs? Or are you talking some kind of node-provided implementation of WASI? uvwasi maybe?
Do you have an idea why that is? Where does the performance come from with Emscripten's custom FS APIs? Isn't that FS API just going through Node's
fslayer?
IIRC it was that the ergonomics of the various APIs such as pathopen had overheads that emscripten syscall later didn't have or could work around. The emscripten syscall layer is modeled after the linux syscall layer used in musl. @kripken can you remember the specific performance issue that stopped us moving foward with switch to WASI internally by default?
Just to explain a bit what we are doing (well mostly what @jerbob92 is doing :D ) - we build C code base with emscripten with this patch, then pass that to wazero (go WASI implementation). It mostly works.
https://github.com/tetratelabs/wazero
Just to explain a bit what we are doing (well mostly what @jerbob92 is doing :D ) - we build C code base with emscripten with this patch, then pass that to wazero (go WASI implementation). It mostly works.
https://github.com/tetratelabs/wazero
That makes sense. Do you know if wazero also implements some of the emscripten syscalls? Or is it only WASI syscalls?
@sbc100
See https://v8.dev/blog/emscripten-standalone-wasm#necessary-api-differences regarding the overhead and other issues that prevented us from moving emscripten to use 100% WASI APIs. But perhaps WASI has improved since then?