tool-conventions icon indicating copy to clipboard operation
tool-conventions copied to clipboard

syscall ABI

Open jfbastien opened this issue 7 years ago • 7 comments
trafficstars

IIUC we currently don't have a stable syscall ABI. I think we should try to standardize something.

Having a stable ABI we agree on means that embedder don't have to roll their own. There's plenty of experience to gain from what Emscripten did, and I would love to have its JavaScript syscall layer as a free-standing thing.

Here's a quick sketch:

  • Each syscall is its own function (unlike e.g. Linux where each syscall signature is a function, taking as first parameter the syscall number).
  • Go through the existing syscalls and adopt ones we deem useful.
  • Embedder can do X when a syscall cannot work on that platform (X TBD, should we allow trapping if e.g. sockets aren't available?).
  • The module for all syscalls is the same. Say syscall.
  • I think we might want to version the module name (i.e. syscall_v0): adding new syscalls doesn't need a new version, but changing any tool-convention ABI behavior would require bumping the version. Unless we think behavior will never change, in which case no versioning.
  • The field for each syscall is just the syscall number macro's name (e.g. exit, fork, read, write, open, ...)
  • IIRC we've talked about adding a custom clang attribute to denote module / field of an export / import.

One open question I have: say a JS embedding wants to let the user choose how to implement filesystem access (maybe WebSQL versus in-memory are two options). How would be offer a stable ABI, and let users choose which JS glue to use? They can't just change the "filesystem" import if all syscalls are in the "syscall" import. Should we group syscalls by theme, and are all of these orthogonal enough that you wouldn't want to have two in the same group sometimes?

jfbastien avatar Dec 06 '17 17:12 jfbastien

Thanks for kicking this off, JF!

How would be offer a stable ABI, and let users choose which JS glue to use?

For filesystems I think we would follow the way it was done in emscripten + NaCl and have the ability to mount different filesystems on the same hierarchy. We'd have to agree on what a particular filesystem type means, though. So in your example "websql" could be the filesystem type, but what does that mean in a non-web embedding?

Are there other examples where we'd want to swap out the host's implementation?

binji avatar Dec 06 '17 18:12 binji

I am concerned that this proposal, as currently phrased, seems to be aimed at encouraging the use of non-standard Web APIs.

sunfishcode avatar Dec 06 '17 18:12 sunfishcode

@binji

For filesystems I think we would follow the way it was done in emscripten + NaCl and have the ability to mount different filesystems on the same hierarchy. We'd have to agree on what a particular filesystem type means, though. So in your example "websql" could be the filesystem type, but what does that mean in a non-web embedding?

I'm thinking that the .wasm would just use the syscalls, and a developer would choose (in this example) which filesystem backs their syscall uses outside of the .wasm. Say you're in JS, that could be configured in your Emscripten build, or your Webpack build (hi @TheLarkInn). Say you're in a non-JS embedding that could be through the command-line.

Are there other examples where we'd want to swap out the host's implementation?

A few random ideas:

  • System settings such as PID, UID, groups, etc.
  • Process tracing / debugging.
  • TTYs / console control.
  • Forking and communication to other "processes" (pipes, etc).
  • Event loop / select / poll / epoll (which brings in some file descriptor things!).
  • Networking (on the web that can be just same-origin, or some WebRTC-based magic).

@sunfishcode

I am concerned that this proposal, as currently phrased, seems to be aimed at encouraging the use of non-standard Web APIs.

That's absolutely not my goal, and I think we can agree that such things are off the table as a design concern. If e.g. Chrome wants to implement the filesystem syscalls using HTML5 filesystem then go for it, but that's in no way tied to the present discussion.

jfbastien avatar Dec 06 '17 18:12 jfbastien

In the Web embedding, we don't have any constraints on the precise syscall ABI, so it means our choices will be arbitrary and hard to validate. It seems like we need some non-web embeddings to participate in this discussion--embedding that will implement the syscalls directly in the host--for the ABI to be meaningful. I have actually heard several non-web embeddings that are considering doing exactly such a thing, so perhaps we could reach out to them and get them together.

Another point is that while it seems like a syscall ABI would certainly relate to the toolchain, it seems like a much broader discussion than just "toolchain conventions". It's like a new wasm-ified POSIX standard.

lukewagner avatar Dec 06 '17 18:12 lukewagner

These might be good goals here:

  • No or at least minimal regressions in emscripten in code size and perf. Example issue: should more date/time handling be done in compiled libc code (larger?), or call out to JS (slower?)
  • Also make sure we can support other existing browser filesystem libraries like BrowserFS.

kripken avatar Dec 06 '17 19:12 kripken

My non-web implementation targets .NET Standard, which is fairly restrictive, though I could make a .NET Classic build that would have full access to the Windows API. If the goal here is to make POSIX for WASM, I could probably find a way to make most of it work...

RyanLamansky avatar Dec 07 '17 18:12 RyanLamansky

I can see an argument in favour standardising the syscall ABI... but I'm not entirely convinced!

Pro:

  • The rest of the ABI (calling conventions, typedefs, C99 ABI, etc) is standardised
  • A stable ABI would assist in integrating libc implementations into different backends

Con:

  • The contract between libc and the "kernel" is not something applications should be relying on.
  • In particular, if the same project ships a both Musl port, and the embedder-side of the syscalls, that's a private contract for the implementer. That is, if you maintain a Wasm port of Musl and also provide the JavaScript syscalls, why should anyone else need to know about that? Toolchains ought to handle symbols "opaquely".
  • I expect that projects like Emscripten will continue to provide the JavaScript-side of syscalls, as well as their own Musl port. And, any other projects (like a .NET port) may inevitably end up maintaining their own Musl port that speaks to their .NET embedder side. Catering for all needs with one syscall ABI is a bit ambitious.

For what it's worth - I have my own Musl port and JavaScript syscalls implementation. I've called it Minscripten.

  • JavaScript side (work-in-progress, roughly usable for what I need): https://github.com/NWilson/minscripten/tree/master/js-syscalls
  • Musl port (a clean Wasm-only port, reviewed by the Musl devs but not accepted upstream): https://github.com/NWilson/musl

For my own experience, I'd suggest the following:

  • Please don't standardise the syscall ABI that Emscripten is currently using. It's the Linux x86 ABI, and it's just horrible - full of legacy grot, including bogus "narrow" versions of syscalls, duplicates of old/new syscalls, and 32-bit time_t. Please if you're going to standardise something don't standardise on 32-bit time_t!
  • I'd suggest using the x32 syscall ABI. It's clean and modern.
  • Minscripten's Musl port is in fact fairly suitable for Emscripten to adopt; I'd be happy to assist in any changes that might be needed, to make it a standard/shared port of Musl for Wasm.
  • Please don't standardise on a number-based ABI. Wasm modules should export functions with names like __syscall_open not __syscall123. My Musl port does this, I think it's quite a bit nicer. There's no reason to use numbered syscalls at all for Wasm.
  • Syscalls I've had to remove:
    • brk - handled internally in Wasm, not JavaScript
    • futex - will be handled internally in Wasm when the futex opcodes for Wasm arrive
    • madvise, mremap, mmap, munmap - handled internally in Wasm, not JavaScript. No need to jump to the kernel when Wasm can call grow_memory!
    • set_tid_address - ditto, Wasm can (and will) set this on its own side, no need for a syscall that I can think of yet
    • Obsolete syscalls, which are part of the x32 ABI but shouldn't be speced for Wasm: afs_syscall, getpmsg, putpmsg, security, tuxcall
    • Syscalls in the x32 ABI that are x86-specific, shouldn't be speced for Wasm: ioperm, iopl, vm86.

Finally, Wasm needs some new syscalls!

  • __syscall_localtime - needed to do timezone conversions in the "kernel". There's simply no way in a browser to read out the timezone database into the /usr/share/zoneinfo format; the JS APIs basically require forwarding the libc localtime call directly to the browser. Thus this needs a new syscall.
  • I'm sure some more things will emerge...

NWilson avatar Feb 23 '18 16:02 NWilson