emscripten icon indicating copy to clipboard operation
emscripten copied to clipboard

SDL2: SDL_CreateWindow(..., SDL_WINDOW_OPENGL) fails in a side module when using pyodide

Open pthom opened this issue 8 months ago • 9 comments

Hi,

When using pyodide, a python module built with emscripten (as a side module) will fail when
calling SDL_CreateWindow(..., flags=SDL_WINDOW_OPENGL): this will ultimately lead to an "indirect call to null".

This issue was initially diagnosed (with a potential fix) at https://github.com/pyodide/pyodide/issues/5584

Root Cause: JS-defined EGL functions can't be called via function pointers

  • The "indirect call to null" crash occurs in SDL when it attempts to call eglGetDisplay and other EGL functions through a function pointer.
  • These functions are implemented in JavaScript (see egl.js). Asking for the C address of these functions from a side module result in getting a NULL address.
  • However those functions can be called, even from a side module. A limitation only occurs when trying to store these functions addresses in a function pointers (those pointer addresses will be 0)

Here's an example from SDL’s source: https://github.com/libsdl-org/SDL/blob/2359383fc187386204c3bb22de89655a494cd128/src/video/emscripten/SDL_emscriptenopengles.c#L32-L67

This code can be summarized as:

#define LOAD_FUNC(NAME) _this->egl_data->NAME = NAME;  // a macro that stores a function pointer
...
LOAD_FUNC(eglGetDisplay); // _this->egl_data->eglGetDisplay is a pointer to eglGetDisplay() (will be NULL)
...
_this->egl_data->egl_display = _this->egl_data->eglGetDisplay(EGL_DEFAULT_DISPLAY);  // Call the function pointer => fail

Potential Solution

I have a working patch for SDL2 (Note: in SDL3, the handling of EGL/emscripten was rewritten).

This patch will trigger when EMSCRIPTEN_DYNAMIC_LINKING is defined (which is the case for Pyodide). It simply adds C trampolines (my_eglXXX functions) that call the JS-defined EGL functions directly. These trampoline functions live in C, have valid addresses, and can safely be used in function pointers.

Here’s what the patch looks like (inside SDL/src/video/emscripten/SDL_emscriptenopengles.cSDL_emscriptenopengles.c):

#ifndef EMSCRIPTEN_DYNAMIC_LINKING
#define LOAD_FUNC(NAME) _this->egl_data->NAME = NAME;
#else
/* In dynamic linking mode (e.g. when building a Pyodide side module), there’s a caveat:
- Most egl* functions are implemented in JavaScript and do not have stable function pointers.
  That is, their address in C is 0, and indirect calls will crash.
- We define C trampolines (my_eglXXX) that wrap these JS functions, and use them instead.
- This gives us valid function pointers and allows the SDL dynamic backend to work correctly.
- Exception: eglGetProcAddress is implemented natively and does have a valid address.
*/
#define LOAD_FUNC(NAME) _this->egl_data->NAME = my_##NAME;

static EGLDisplay my_eglGetDisplay(NativeDisplayType d) { return eglGetDisplay(d); }
static EGLBoolean my_eglInitialize(EGLDisplay d, EGLint *m, EGLint *n) { return eglInitialize(d, m, n); }
static EGLBoolean my_eglTerminate(EGLDisplay d) { return eglTerminate(d); }
// ... and so on
#endif

This patch makes no changes to SDL behavior in normal (non-side-module) builds, and (based on my tests) resolves the issue in Pyodide side modules.

I would be interested in getting feedback: does this patch seem legit and should it be upstreamed to SDL? More generally, how could this issue be fixed, and how could I help if needed.

Reproduction code

I can reproduce this issue when using pyodide. This comment provides a link to a minimal code which reproduces the issue withing pyodide.

The C++ code reproducer code is:

#include <SDL.h>
#include <stdio.h>

#include <SDL_video.h>
#include <emscripten/emscripten.h>

void dummy_sdl_call()    // a function that will be called by python (from a side module)
{
    if (SDL_Init(SDL_INIT_VIDEO | SDL_INIT_EVENTS) != 0) {
        fprintf(stderr, "SDL_Init Error: %s\n", SDL_GetError());
        return;
    }

    SDL_SetHint(SDL_HINT_IME_SHOW_UI, "1");

    printf("About to call SDL_CreateWindow\n");
    int window_flags = SDL_WINDOW_OPENGL;
    SDL_Window* window = SDL_CreateWindow(
        "Dummy Window",
        SDL_WINDOWPOS_CENTERED,
        SDL_WINDOWPOS_CENTERED,
        640, 480,
        window_flags
    );
    printf("SDL_CreateWindow called\n");

    if (!window) {
        fprintf(stderr, "SDL_CreateWindow Error: %s\n", SDL_GetError());
    } else {
        printf("SDL_CreateWindow succeeded\n");
        SDL_DestroyWindow(window);
    }

    SDL_Quit();
}

#ifndef BUILD_EMS_APP
#include <nanobind/nanobind.h>
NB_MODULE(daft_lib, m) {
    m.def("dummy_sdl_call", &dummy_sdl_call);
}
#else

int main(int argc, char* argv[])
{
    dummy_sdl_call();
    return 0;
}

#endif

This code can also be compiled as a standalone emscripten html app with the command line below

 emcc daft_lib.cpp -o daft_lib.html \
  -sUSE_SDL=2 \
  -sMAX_WEBGL_VERSION=2  \
  -D BUILD_EMS_APP \
  --shell-file ${EMSDK}/upstream/emscripten/src/shell_minimal.html

However, this will not trigger the issue as the app will not be compiled as a side module in this case. I'd be interested if someone finds a way to adapt it so that the code is compiled as a side module.

Version of emscripten/emsdk:

This can be reproduced using the emsdk provided by the latest version of pyodide.

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 4.0.6 (1ddaae4d2d6dfbb678ecc193bc988820d1fc4633)
clang version 21.0.0git (https:/github.com/llvm/llvm-project 4775e6d9099467df9363e1a3cd5950cc3d2fde05)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /home/pascal/dvp/_Bundle/imgui_bundle_online/imgui_bundle_pyodide_tooling/repositories/pyodide/emsdk/emsdk/upstream/bin

Thanks to all creators and maintainers for the huge work on emscripten!

pthom avatar Apr 15 '25 13:04 pthom

So the problem is that eglGetProcAddress is returning NULL for these function? That does seems like a bug we should fix.

However just skipping eglGetProcAddress completely in SDL2 also seems like good improvement. I would just skip it in all cases by the way.

sbc100 avatar Apr 15 '25 17:04 sbc100

So the problem is that eglGetProcAddress is returning NULL for these function? That does seems like a bug we should fix.

I am not sure that this is the issue. IMHO, these addresses are not obtained via a call to eglGetProcAddress.

Let's analyse the code:

// LOAD_FUNC is a macro that will store a function pointer into _this->egl_data->
#define LOAD_FUNC(NAME) _this->egl_data->NAME = NAME;
...

// It is used to store a pointer to eglGetDisplay()  inside _this->egl_data->eglGetDisplay
// This pointer will be null.
LOAD_FUNC(eglGetDisplay);
...

// Then we try to call eglGetDisplay() via this function pointer
_this->egl_data->eglGetDisplay(EGL_DEFAULT_DISPLAY);

// IMPORTANT: a direct call to eglGetDisplay() **would work** (emscripten would have called the js function).
// What fails is the call via a function pointer

pthom avatar Apr 15 '25 17:04 pthom

Oh I see so its just that if its address taken only it doesn't work. It should be easy enough to repo using just this code in side module then?:

void foo() {
  void* addr = &eglGetDisplay;
  assert(addr);
}

sbc100 avatar Apr 15 '25 17:04 sbc100

@sbc100 : thanks for your help!

I initially suspected that this issue was linked to the usage of side & main modules.

I now think there are some other flags in the pyodide compilation options which I overlooked.

As a matter of fact, I tried to reproduce it outside of Pyodide, and for the moment, my reproduction code does not reproduce the issue (i.e it works fine in vanilla emscripten, even when using side + main module). Perhaps that @ryanking13 (Pyodide's main maintainer) will know more.

Below is an example with a main + side module which does not fail. As a conclusion, more investigations are required,

main.c

#include <stdio.h>
#include <dlfcn.h>
#include <emscripten/emscripten.h>

typedef void (*side_func_t)(void);

int main() {
    printf("Main module started\n");

    // Load the side module
    void* handle = dlopen("side.wasm", RTLD_NOW);
    if (!handle) {
        printf("Failed to load side module: %s\n", dlerror());
        return 1;
    }

    // Locate the function symbol
    side_func_t run_side = (side_func_t)dlsym(handle, "run_sdl_from_side");
    if (!run_side) {
        printf("Failed to find symbol in side module: %s\n", dlerror());
        return 1;
    }

    // Call the function from side module
    printf("Calling run_side_module from side module...\n");
    run_side();
    printf("Call to run_side_module finished\n");

    return 0;
}

side.c

#include <SDL.h>
#include <stdio.h>

void run_sdl_from_side() {
    printf("[side] SDL_Init...\n");
    if (SDL_Init(SDL_INIT_VIDEO) != 0) {
        printf("[side] SDL_Init failed: %s\n", SDL_GetError());
        return;
    }

    SDL_Window* win = SDL_CreateWindow("Test", 0, 0, 100, 100, SDL_WINDOW_OPENGL);
    if (!win) {
        printf("[side] SDL_CreateWindow failed: %s\n", SDL_GetError());
        return;
    }
    printf("[side] SDL_CreateWindow succeeded.\n");
    printf("Yay 1\n");

    SDL_DestroyWindow(win);
    SDL_Quit();
}

Makefile

EMCC ?= emcc

CFLAGS = -O2 -sINITIAL_MEMORY=20971520 -sALLOW_MEMORY_GROWTH=1

# Flags for building the main module:
MAIN_FLAGS = -sMAIN_MODULE=1 -sEXPORT_ALL=1 -sMAX_WEBGL_VERSION=2 --preload-file side.wasm

# Flags for building the side module:
# -sUSE_SDL=2 + direct linking to SDL2.a is needed to get the SDL code to work.
SIDE_FLAGS = -sSIDE_MODULE=1 -DEMSCRIPTEN_DYNAMIC_LINKING -sUSE_SDL=2 $(EMSDK)/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/pic/libSDL2.a

.PHONY: all clean

all: main.html side.wasm

side.wasm: side.c
	$(EMCC) $(CFLAGS) $(SIDE_FLAGS) side.c -o side.wasm

main.html: main.c side.wasm
	$(EMCC) $(CFLAGS) $(MAIN_FLAGS) main.c -o main.html

clean:
	rm -f main.html main.js side.wasm

Note: in this example, sdl2 must be build separately with

embuilder build sdl2 libhtml5 --pic 

pthom avatar Apr 15 '25 18:04 pthom

Thanks for the investigation @pthom.

While I am not 100% sure, I have one suspicion that seems related to your issue. Pyodide links SDL in a non-standard way (at least, I believe Emscripten never expected people to use it in that way) to optimize the output size:

Emscripten SDL consists of JS parts and C parts, and we link the JS parts to the main module and C parts to the side module. This is to reduce the main module size, because we don't want to link the whole parts of SDL, as 99% of Pyodide users will not use it.

Since the side module can resolve the symbols in the main module, it worked quite well historically, but I am suspecting that is has something to do with your error.

ryanking13 avatar Apr 16 '25 12:04 ryanking13

Are you talking about SDL1 or SDL2? From the bug title I assume you mean SDL2 but when you say some parts are in JS that sounds like the SDL1 port maybe?

Does pyodide not use the normal SDL2 port? i.e. does it not use -sUSE_SDL=2? If not can you point me to the the build recipe for it?

sbc100 avatar Apr 16 '25 16:04 sbc100

Are you talking about SDL1 or SDL2?

I'm talking about SDL2

when you say some parts are in JS that sounds like the SDL1 port maybe?

SDL1 vs SDL2: Wow, great remark, thanks! I now understand that emsdk/upstream/emscripten/src/lib/libsdl.js is actually for SDL1 and not SDL2.

EGL vs SDL: the js parts which I mention are part of EGL, not SDL, they are defined in libegl.js. Is this file the correct version, or is it also deprecated when using SDL2?

Does pyodide not use the normal SDL2 port? i.e. does it not use -sUSE_SDL=2?

As @ryanking13 said, the main module of pyodide tries to not link to C libraries such as sdl. As such its main module will not specify -sUSE_SDL=2

It does however link to -lsdl.js (i.e. SDL1 !): see https://github.com/pyodide/pyodide/blob/4fbbbedc09496c6968086d69aadba75398718b13/Makefile.envs#L185-L191

If not can you point me to the the build recipe for it?

When a python module (compiled into a .so via pyodide) needs to use SDL2, it will need to add libSDL2.a manually (since SDL2 is not linked to the main module).

For example, I had to do this in my module CMakeLists:

# Add -sUSE_SDL=2 is not enough
target_compile_options(daft_lib PUBLIC "-sUSE_SDL=2")
target_link_options(daft_lib PUBLIC "-sUSE_SDL=2")

# We need to manually add libSDL2.a from the emscripten sysroot (using a fPIC compiled version)
# to our python module
set(ems_lib_path_pic ${EMSCRIPTEN_SYSROOT}/lib/wasm32-emscripten/pic)
target_link_libraries(daft_lib PUBLIC ${ems_lib_path_pic}/libSDL2.a)
install(TARGETS daft_lib DESTINATION .)

And the fPIC compiled version needs to be build manually, using

embuilder build sdl2  --pic 

=> And in this case, there is perhaps a possibility of duplicate definitions conflicts between SDL1 and SDL2.

pthom avatar Apr 16 '25 16:04 pthom

Are you talking about SDL1 or SDL2?

Yes we are talking about SDL2, and using SDL2.

but when you say some parts are in JS that sounds like the SDL1 port maybe?

I also didn't know libsdl.js was for SDL1. I guess we can remove it from the Pyodide main module I guess.

Does pyodide not use the normal SDL2 port? i.e. does it not use -sUSE_SDL=2? If not can you point me to the the build recipe for it?

We are linking libegl.js, libwebgl.js, and libhtml5_webgl.js to the main module (which I think is linked when USE_SDL=2 is used.)

And we link the native C SDL libraries to the side module. We use -sUSE_SDL=2 in the link flag, but we also pass -lSDL explicitly as well, as we've experienced that USE_... flags doesn't work very well in side modules (I guess there was issue about it, but I don't exactly remember).

ryanking13 avatar Apr 17 '25 04:04 ryanking13

(I guess there was issue about it, but I don't exactly remember).

Okay, I found the issue and the code pointer (https://github.com/emscripten-core/emscripten/issues/17675, code pointer)

ryanking13 avatar Apr 17 '25 04:04 ryanking13

I'm closing this issue, as it was fixed inside pyodide itself (see https://github.com/pyodide/pyodide/pull/5656 and https://github.com/pyodide/pyodide/issues/5584#event-18188778296)

pthom avatar Jun 17 '25 20:06 pthom