wasi-libc icon indicating copy to clipboard operation
wasi-libc copied to clipboard

polling on mix of libc-posix socket/file descriptors with non-libc WASI pollable handles

Open pavelsavara opened this issue 1 year ago • 11 comments

In context of dotnet sockets I would like to be able to wasi:poll_poll on mix of handles

  • which were created by consuming other WASI APIs, like wasi:http
  • and file handles obtained by using libc sockets APIs

The motivation to do it on single "system" call is to

  • be able to make progress on events from HTTP while socket is not ready and vice versa
    • if we called each select/poll or poll_poll separately, we would get blocked only on part of pollables
  • to be able to block (stop spinning) if there are no I/O events (and no CPU bound jobs)

At the moment, we can hack it via using libc internal descriptor_table_get_ref and it's data structures to obtain underlying pollables. @dicej made similar hack for mio But it's fragile and bad practice.

I can see few possible solutions

  • expose libc function accepting both lists of resources
    • void wasi_poll_plus_fd(poll_list_borrow_pollable_t *in, list_pollfd_t *in, wasip2_list_u32_t *ret)
    • this is with wit-bindgen generated C structures on APIs and with list of pollfd struct
  • we can try simpler C types, something like
    • void wasi_poll_plus_fd(int *pollables, int pollables_count, struct pollfd *fds, int fds_count, int **ret, int *ret_count)
  • or we can expose just the mapping from fds to list of pollables
    • void socket_fds_to_wasi_pollables(struct pollfd *fds, int fds_count, int **pollables, int pollables_count)
    • this would have to be called before each call to wasi:poll_poll
  • create libc file descriptor by registering external pollable
    • this would have to be un-registered later
    • calling socket oriented poll() API on any pollable (clock) feels confusing to me

Relevant code, possibly to be refactored and reused for above. https://github.com/WebAssembly/wasi-libc/blob/7d4d3b83fc66c79b3faa5989e67ed2d1042dacaf/libc-bottom-half/sources/poll-wasip2.c#L27-L151

cc @badeend

pavelsavara avatar Sep 25 '24 10:09 pavelsavara

Out of the four options, 1 and 2 seem the best to me.

or we can expose just the mapping from fds to list of pollables

Note that one libc socket can map to anywhere from 0 to 3 pollable resources depending on its state. With the proposed function signature the user can't know which pollable relates to which aspect of the socket.

calling socket oriented poll() API on any pollable (clock) feels confusing to me

POSIX poll is for any async I/O and not specific to sockets, so there wouldn't be any confusion for me. Regardless, I'd still go for option 1 or 2 as those accomplish the desired goal in a single call.

badeend avatar Sep 25 '24 19:09 badeend

After discussing with @sunfishcode we came up with another alternative solution: Rather than attempting to integrate external resources into the libc/POSIX polling mechanism, we could expose the libc-managed WASI resources for external use:

int32_t get_wasip2_input_stream(int fd); // wasi:io/[email protected]
int32_t get_wasip2_output_stream(int fd); // wasi:io/[email protected]
int32_t get_wasip2_filesystem_descriptor(int fd); // wasi:filesystem/[email protected]
int32_t get_wasip2_tcp_socket(int fd); // wasi:sockets/[email protected]
int32_t get_wasip2_udp_socket(int fd); // wasi:sockets/[email protected]
// etc..

The consumer is in charge of getting the pollables from them.

These come with some implicit assumptions that should be documented:

  • These should be considered specific to Preview2. They will not work on preview1 and may not work on preview3, depending on how the future pans out.
  • Any one of them may fail (with a negative value?). E.g.
    • Calling get_wasip2_filesystem_descriptor on stdout will always fail.
    • Calling get_wasip2_output_stream on a socket may fail if it isn't connected yet.
  • The resources remain "owned" by wasi-libc:
    • The consumer should not drop these resources.
    • Calling POSIX close invalidates any of these resources.
    • Child resources (e.g. Pollables) derived from these resources must be dropped before calling POSIX close. Failing to do so may trap.

badeend avatar Oct 02 '24 10:10 badeend

If WASPp3 will call back on promises instead of central polling. And if that lands in few months, I'm happy to wait for that and drop this issue.

I'm not 100% clear about WASIp3 callbacks from streams. I'm worried about performance. (I admit I didn't study WIT draft yet)

  • Maybe WASI callback would be cheaper than unix syscall ? (I understand that epoll is optimization of that)
  • Are we going to marshal and allocate new Promise for each event ?
  • Would there be any throttling (Nagle's algorithm) ? Or it could happen that stream sender (component) would trigger callback in receiver (component) for each byte if they push byte by byte ?

pavelsavara avatar Oct 02 '24 11:10 pavelsavara

Also the way how we hide those promises/streams and callback behind wasi-libc matters.

pavelsavara avatar Oct 02 '24 11:10 pavelsavara

Maybe WASI callback would be cheaper than unix syscall ? (I understand that epoll is optimization of that)

If the read or write call can complete immediately (because there are bytes available or the host is ready to receive bytes, respectively), then there's no need for the host to allocate a task or call a callback. So we only pay the cost of allocating a task and calling the callback when we're I/O limited anyway. Also, e.g. Wasmtime is optimized to make these guest<->host calls as cheap as possible.

As usual, we'll want to benchmark and optimize as necessary if we do discover bottlenecks.

Are we going to marshal and allocate new Promise for each event ?

We will create a new task each time an async host function is unable to complete immediately, yes.

Would there be any throttling (Nagle's algorithm) ? Or it could happen that stream sender (component) would trigger callback in receiver (component) for each byte if they push byte by byte ?

I guess that depends on whether the host implementation of TCP streams buffers output (either in user space or at the kernel level).

dicej avatar Oct 02 '24 14:10 dicej

We will create a new task each time an async host function is unable to complete immediately, yes.

With the exception of streams, right? I assume they will reuse the same Task for all reads/writes?

badeend avatar Oct 02 '24 14:10 badeend

We will create a new task each time an async host function is unable to complete immediately, yes.

With the exception of streams, right? I assume they will reuse the same Task for all reads/writes?

Luke has not yet written down a spec for streams AFAIK, so I made one up for my prototype implementation, and that creates a new Task as necessary for each read or write. I'm trying to remember what Luke was planning for the final spec; perhaps it just uses the stream ID for callback notifications.

dicej avatar Oct 02 '24 14:10 dicej

With the exception of streams, right? I assume they will reuse the same Task for all reads/writes?

dotnet Task is single resolve only.

pavelsavara avatar Oct 02 '24 14:10 pavelsavara

dotnet Task is single resolve only.

Sure, but p3 tasks might not map one-to-one with .NET Tasks

dicej avatar Oct 02 '24 14:10 dicej

@dicej Okay! It doesn't really matter for this issue. Was just curious.

badeend avatar Oct 02 '24 14:10 badeend

With the exception of streams, right? I assume they will reuse the same Task for all reads/writes?

Right, the current proposal (which I'm trying to finish up and post) has:

stream.read $t: [readable-stream-index:i32 ptr:i32 n:i32] -> [result:i32]

where the result bits may indicate that stream.read blocked and that the caller needs to task.wait for the read to complete, where task.wait will return a "stream read" event-code along with the readable-stream-index and how much (<= n) was written into ptr. And after that, another stream.read can be started with the same readable-stream-index, and so on. However, bindgen should be quite able to bind each stream.read invocation to a single-shot task/future/promise that is resolved when the corresponding "stream read" event is returned by task.wait.

lukewagner avatar Oct 02 '24 16:10 lukewagner