SDL Non-blocking rwops

While I'm getting medieval on the RWops API, I figured I should open this up for discussion:

The changes I made yesterday change the return value of RWread to something like this:

a positive number: number of bytes read
zero: eof
negative 1: error (check SDL_GetError for human-readable details)
negative 2: this is a non-blocking RWops and we would have to wait, so try again later.

(RWwrite is now positive number of bytes written, or -1 on error when no bytes were written. If you wrote less than expected, it's either non-blocking or the next call is likely to error out.)

Both of these functions were not renamed, but now take one less argument, so existing apps will have to examine these calls.

This is all a little unorthodox, but my thinking is that a) things that don't handle non-blocking reads will see a negative value and treat it as an error and b) POSIX gives you an actual error (-1) and wants you to check errno for EWOULDBLOCK, but checking SDL_GetError for this is messier for several reasons.

There are currently no non-blocking RWops offered by SDL, and I will not add a way to toggle non-blocking state. I think eventually it would be new functions (SDL_RWFromFileNonblocking or whatever) to create a non-blocking RWops. The goal is to make this flexible enough that apps and libraries implement their own RWops that hook up to raw sockets, HTTP transfers, etc, though.

Opinions and arguments welcome! This can still be changed.

Dec 15 '22 16:12 icculus

I like it!

While we're at it, does it make sense to have a callback based I/O completion port style API?

Dec 15 '22 16:12 slouken

While we're at it, does it make sense to have a callback based I/O completion port style API?

There's an wishlist issue for that in #1374 ... I haven't thought about it a lot yet. My gut reaction is that having a callback fire when i/o completes would be very efficient but also somewhat unpleasant to work with, but I could definitely be convinced otherwise.

Dec 15 '22 16:12 icculus

Maybe a callback is provided when creating a non-blocking RWops and it fires in addition to the usual non-blocking semantics? So you can optionally use it to notify your app so you don't have to poll, but it isn't a formal part of the RWops struct...that one internal implementation deals with it and not everyone that implements a RWops...

Dec 15 '22 16:12 icculus

Callback based I/O is often used for asynchronous asset loading from disk, not just for non-blocking network I/O. It's also the most efficient way to handle network I/O on Linux.

Dec 15 '22 17:12 slouken

probably not better but other ideas:

could be also an event based, async API would push back some SDL Event ?
could also be done natively using some Rust layer ?
provide some kind of template, so that any API can be turned into an async command

Dec 16 '22 08:12 1bsyl

Hi, I'm the one who originally proposed an async RWops API way back when SDL2 was new

non-blocking file I/O doesn't really work when the file is on a local filesystem. Those files are always "ready to read" and "ready to write" as far as the OS is concerned. This is true whether you're using O_NONBLOCK, poll, select, epoll, kqueue, etc. Those were designed for "files" with unpredictable data availability and latencies in mind - pipes, sockets, character terminals, etc - not the local filesystem.

This is mostly mostly only really important when doing non-sequential reads, because the OS already buffers writes for you and readahead buffering is generally good enough to keep the data flowing for sequential reads. When the data isn't buffered already, now you're stuck waiting on the disk, and that can be a problem. Readahead is generally only measured in KB, so for large reads it's not really sufficient either.

So synchronous non-blocking I/O like on a socket or a pipe just doesn't work. I only learned that trying to make an implementation myself, so a lot of what I said in my original suggestion was wrong. The only solution is async I/O. You probably want a random-access interface, not a streaming interface like the current RWops. You may even want a vectored interface (like libuv's uv_fs_read and uv_fs_write) but not sure how well used that particular functionality would be.

I actually worked on a small async disk file I/O interface for SDL programs a couple years ago, but I kinda lost interest pretty quick. Source code for the marginally broader library is here if you want to give it a look: https://sourceforge.net/p/libs4sdl2/code/ci/default/tree/SDL_EXT/

Callbacks or pushing an event into the queue both work fine, but for C++ devs callbacks will be nicer (coroutines and lamdas). I definitely wouldn't do futures.

Jan 12 '23 08:01 nfries88

Ok, how about we add something like this to SDL_iostream.h ...

typedef Uint32 SDL_AsyncIOTask;

typedef void (SDLCALL *SDL_AsyncIOCallback)(void *userdata, SDL_IOStream *context, SDL_AsyncIOTask task, Sint64 offset, void *ptr, size_t size, SDL_IOStatus status);

/**
 * Start an async read to `context`, to read `size` bytes at `offset` into `ptr`.
 *
 * This function does not block; it will start the task and return.
 *
 * Results of the task are reported later through `callback`. This can happen at any time,
 * including during this function call and later in another thread.
 *
 * `ptr` must remain valid until `callback` is called.
 *
 * `callback` will be called in all cases: completed i/o, failed i/o, cancelled i/o.
 * It will be called _during_ this function if async i/o can't begin.
 *
 * `callback` is always called exactly once.
 *
 * Cancelled i/o will call the callback with `status` set to SDL_IO_STATUS_NOT_READY.
 * On short reads/writes, it will report a different SDL_IOStatus. Success reports
 * SDL_IO_STATUS_READY. Callback's `size` will be the amount read, which might
 * be less on error or EOF.
 *
 * Do not mix async and normal i/o!
 * 
 * Returns a task ID, which can be used to cancel the in-flight operation,
 * or 0 if no operation will be started.
 */
extern SDL_DECLSPEC SDL_AsyncIOTask SDLCALL SDL_AsyncReadIO(SDL_IOStream *context, Sint64 offset, void *buf, size_t size, SDL_AsyncIOCallback callback, void *userdata);

/* also add the equivalent AsyncWriteIO function. */

/**
 * Cancel a pending/in-progress async i/o to an SDL_IOStream.
 *
 * This will fire callbacks for incomplete requests before returning.
 *
 * Note that the callback may fire a successful completion in another thread
 * while this function is in progress!
 * 
 * Once this function returns, the callback will not fire again, and it is safe to
 * deallocate the buffer that was to be used by this task.
 * 
 * If the task is invalid (bogus value or the callback has already fired on it),
 * this function does nothing.
 */
extern SDL_DECLSPEC void SDLCALL SDL_CancelAsyncIO(SDL_AsyncIOTask task);

Inside SDL_IOStreamInterface:

    /* Async i/o interface. You do not need to implement this! If these
       functions are NULL, SDL will implement this for you with the non-async
       read/write calls on a background thread. */

    /* this isn't called with the app's callback; this will call into some SDL code that
       is managing async i/o, and _that_ will call the real app callback. */
    void * (SDLCALL *async_read)(void *userdata, SDL_AsyncIOTask task, Sint64 offset, void *buf, size_t size, SDL_AsyncIOCallback callback, void *userdata);

    void (SDLCALL *async_cancel)(void *userdata, SDL_AsyncIOTask task);

Most things will not implement this. If the function pointers are NULL, SDL will spin a thread on the first async request and manage all async i/o for unimplemented objects out of that thread. It'll just queue tasks, seek/read from the stream, call the callbacks, and move on to the next task. So it can work with any stream that is otherwise synchronous in nature (and if the reads are sequential, it can work with streams that can't even seek). On Emscripten, there's no thread, but it'll fire off an async task that'll run next iteration.

But for things that can implement it (normal file i/o through the win32 API, etc), it can use the OS facilities, skipping the need for a thread.

Thoughts?

May 28 '24 15:05 icculus

Games are always able to create work threads and do loading I/O on those threads. What are the use cases where this provides meaningful benefit for the application?

May 28 '24 19:05 slouken

The thinking is "real" async, at the kernel layer, can be implemented, but things that can't implement it have a functionally equivalent fallback.

But maybe the real async interfaces aren't worth more than just spinning a thread anyhow...?

May 28 '24 23:05 icculus

The only case I know where async I/O is critical is high performance networking. In that case you'd want to use native interfaces directly.

Maybe I'm missing something?

May 28 '24 23:05 slouken

I think this is less performance and more "I want data from the disk when it's available, and I don't want to block at all while the data is coming off the disk/competing with other things that want the disk, etc".

Two things I can think of immediately is a loading screen that keeps animating at 60 fps, because it's able to deal with chunks of data for the first time when they are in RAM without having to wait for a read to finish, because they were notified when it was ready, and open world games that are sort of continuously streaming data from the disk as you drive around the world.

This is a different thing than non-blocking i/o, since those facilities (select(), poll(), etc) always treat the local disk as "ready," even if the disk needs to spin up from low-power mode. Win32 calls this thing "overlapped i/o".

(But again, maybe this isn't worth it. I'm just proposing a solution to an open bug, but closing it as WONTFIX is a solution too. :) )

May 28 '24 23:05 icculus

Yeah, for that use case this seems like a good addition to the API.

May 29 '24 01:05 slouken

But maybe the real async interfaces aren't worth more than just spinning a thread anyhow...?

On Windows overlapped I/O is tightly integrated with the filesystem cache, so there is a pretty significant advantage in terms of both system resource use and general overhead. FreeBSD also stands out with direct kernel support for the posix AIO interface, plus an extension to get notification via kqueue instead of signals.

Linux has had a wonky kernel AIO interface (not used by the posix AIO implementation, which is an incredibly naiive threadpool in userspace) that has been favored by databases for a long time but probably wouldn't be advantageous to most software using SDL, and if the security vulnerabilities ever get resolved clever use of io_uring features provide notable performance advantages for modest reads over many files but it would probably be challenging to encapsulate that use pattern in a "simple" general purpose interface.

emscripten also stands out because it has asynchronous filesystem APIs (even though they're ugly as sin) and because it is probably the biggest (or only?) target for SDL where threads may not be an option.

Jun 01 '24 04:06 nfries88

Okay, I've redesigned this interface three times in three days, and I still hate where it's at.

It's become clear to me that this shouldn't be added to a generic SDL_IOStream; my plan was pretty naive. It should probably be a totally separate API if we do it at all, but I haven't found a way to make something that's portable, flexible, useful, and pleasant to use, at least so far.

But if we're not putting it into SDL_IOStream, we can bump this out of the 3.0 ABI milestone, and maybe revisit it later.

For now, I'm moving to 3.2.0, but honestly I might bump it to 3.x. Or give up on it.

Jun 07 '24 03:06 icculus

I would probably make it part of the Storage API tbh, async I/O from cloud and optical disk storage (is the latter still a common need though?) would probably be nice to have too, Steamworks supports async I/O from storage, and emscripten's APIs are more similar to the storage interface too. I don't love this idea personally because I really dislike the storage interface (while acknowledging that it's the simplest way to do it given available underlying interfaces), but it's the place it makes the most sense.

async cancellability is inherently unreliable, adds a little overhead, and for local filesystem I/O is probably completely pointless unless the game is loading massive amounts of data from more files than the system can handle at once or the system actually supports I/O cancellation after initiation (posix AIO supports cancellation in its interface, but in practice I don't think any implementations will stop in the middle of I/O).

I know it's something programmers coming from JS or another highly asynchronous language would expect, but it's probably not worth the effort of supporting.

If dealing with callbacks is part of the headache, putting a completion event in the main event queue with some opaque user-provided token (as void*) is a good enough approach.

Jun 09 '24 04:06 nfries88

I'm going to bump this to 3.x.

Aug 05 '24 03:08 slouken