SDL Emscripten: Support running SDL programs in a web worker

Currently the SDL implementation in Emscripten assumes it is running on the main thread, according to the SDL Emscripten README. This is a problem because most SDL-based programs have an infinite main loop which assumes it can update the UI synchronously, and must be rewritten to explicitly yield to the browser main loop to run in Emscripten.

If SDL could be made to run in a web worker instead, communicating with the main thread as needed, it would remove the need to rewrite the main loop. Then most SDL-based programs could be run unmodified in the browser.

Applications:

I want to allow students to write & run graphical PyGame-based programs in a web browser, without requiring students to restructure the program's main loop in a way that is inconsistent with how a PyGame program would be written normally outside a browser.
The Infinite Mac project had to fork & rewrite each of its SDL-based emulators (Basilisk, SheepShaver, Mini vMac, etc) to not use SDL at all because it needed those emulators to run inside a web worker. If SDL could run inside a web worker then the forking might not have been necessary.

Is this a feature that anybody is working on or thinking about already? (I could not find an existing issue thread in the SDL or Emscripten issue trackers.)
Are there any significant concerns about the feasibility of implementing this feature?
- Emscripten already appears to proxy SDL calls from pthreads to the main thread. I imagine you could proxy calls to browser objects like screen in a similar fashion from SDL in a web worker to a main thread.
- Graphics updates in the web worker could be made to an OffscreenCanvas obtained from canvas.transferControlToOffscreen on the main thread
- Input events from mouse/keyboard/etc on the main thread could be forwarded to the web worker via a SharedArrayBuffer, to avoid needing to use postMessage on the web worker (which is in the middle of running an infinite event loop).
Is anyone besides myself interested in trying to implement this? 😉

Oct 18 '24 17:10 davidfstr

This would be ugly code that is difficult to maintain, with a million things that silently break when done in a background thread, waiting to surprise everyone.

If a reasonable PR were to appear, I'd accept it, but I don't think we're going to work on this otherwise.

Oct 18 '24 19:10 icculus

restructure the program's main loop in a way that is inconsistent with how a PyGame program would be written normally

Maybe it'd make sense for pygame to be structured to support that for everyone instead? In LÖVE its main loop is a function that gets run by a coroutine under the hood, for example. It wasn't a big deal for users to switch to that when we made that change, although most new users don't end up modifying its default main loop.

Oct 18 '24 23:10 slime73

If you can make something that works with the main callbacks API in SDL 3, the same code will work on desktop and emscripten without any changes.

Oct 18 '24 23:10 maia-s

For my own application, waiting for PyGame to upgrade from SDL 2 to 3 and then extending PyGame's API to support the main callbacks API seems promising as a long-term solution. I'll ask the PyGame folks downstream about it.

This would be ugly code that is difficult to maintain, with a million things that silently break when done in a background thread, waiting to surprise everyone.

@icculus Could you elaborate on your intuition a bit?

Is it likely that SDL would need to have two separate modes, with one mode running on the main thread and a different mode for running on a web worker? (Multiple modes would double the testing cost of every feature.)
Is it likely that there would be a large number of places in SDL that would need to be patched individually, rather than a small number of chokepoints? (Many patch points require a large one-time retest pass.)

Oct 21 '24 15:10 davidfstr

Is it likely that SDL would need to have two separate modes, with one mode running on the main thread and a different mode for running on a web worker? (Multiple modes would double the testing cost of every feature.)

My guess is a lot of it would be proxying stuff to the main thread, so a lot of things inside SDL that look like...

do_something();

...become something more like...

if (we_are_the_main_thread()) {
    do_something();
} else {
    queue_a_task_for_the_main_thread( do_something );
}

(that's an oversimplification in some ways, and in others it's more complex.)

The other thing worth noting is that having to proxy like this is annoying, but it has another side effect: if you need something proxied to the main thread you have to wait for it. And that wait is the main thread is only going to be able to run proxied tasks at the refresh rate of the monitor, when the animation callback runs to pick up new tasks. Which means sometimes you will do something seemingly-innocuous and discover it has a 16 millisecond lag built into it (or more or less, depending on the user's monitor's refresh rate!). And when that is done, if you have three more seemingly-innocuous things, each has to wait for the next animation callback if it needs to get results or just synchronize...a small loop of innocuous things can take hundreds of milliseconds and cause you to have single-digit framerates, mostly because it's waiting for tasks to get picked up.

Some of that is avoidable if the tasks can be queued without waiting for results, but they will always have a lag until the main thread picks up the work.

Is it likely that there would be a large number of places in SDL that would need to be patched individually, rather than a small number of chokepoints? (Many patch points require a large one-time retest pass.)

Part of the problem is we don't actually know. It might not be a lot, but it could also be three things and we only find two, leaving the third to bite someone later. But you should assume most of the subsystems in SDL eventually have to talk to a platform-specific piece of code, which in a web browser usually means interacting with the DOM or the global window object, neither of which are available to web workers. So assume almost all of it would need to be proxied: video, audio[^1], input, etc.

[^1]: audio is in fact already proxied this way due to how the system works, so it can send a big block of data at a time and not care what happens to it afterwards. Input lag and framerate are very different beasts, though.

Oct 22 '24 23:10 icculus

Gonna bump this out of 3.2.0, but despite my lengthy moaning, if a PR shows up that pulls this off in a reasonable way, we'd definitely merge it.

Oct 22 '24 23:10 icculus

Thanks for taking the time to explain icculus. It does sound like a lot of individual patch points are probably involved, which would be high effort.

I've previously written an API-compatible reimplementation of PyGame† which runs in the web browser, which has been in production for the last several years. It spends most of its time making "notify" calls from a web worker to the main thread that don't require a response, which execute pretty fast. I don't think postMessage calls to the main thread are restricted to operating at the refresh rate of the monitor, since requestAnimationFrame isn't involved. There are also a few "query" calls that need to return a response which operate quite a bit slower than "notify" calls because they block the web worker on a response from the main thread. A "query" is mainly used for the "get next events" request that happens each frame.

I'll spend a little more time this week looking at SDL's browser interface to see if there's a path forward to drafting a PR for this feature myself in a reasonable way. My C knowledge is a bit rusty and I'm not familiar with SDL's API itself, only the PyGame wrapper around it.

† PyGame pretty much just wraps SDL and exposes its functionality to Python code.

Oct 24 '24 00:10 davidfstr