SDL icon indicating copy to clipboard operation
SDL copied to clipboard

Unaligned stacks on i686-w64-mingw32, may lead to crashes

Open skeeto opened this issue 1 year ago • 2 comments

Affects both SDL2 and SDL3. GCC 4.5.2 and later assume a 16-byte aligned stack, but the x86 Windows stack is not necessarily so aligned. This problem was observed in #477 (2011), and the workaround was to use -mpreferred-stack-boundary to, in essence, disable this assumption when building SDL itself — at some small performance and size cost.

However, that disabled assumption does not carry into applications running on top of SDL, particularly SDL_main and thread entry points. The use of -mpreferred-stack-boundary prevents most such crashes on the main thread purely by accident, which, in addition to the x64 ABI being unaffected, is probably why it's gone mostly unnoticed for the past 12 years. That is, SDL 16-byte aligns the main thread stack for applications by chance — a result of the particular choice of compiler options, GCC code generation, and that Windows enters SDL with a consistently-congruent stack pointer on the main thread.

Here's a simple SDL program that may observe unaligned locals, and does so consistently for me using the official SDL2 build:

#include "SDL.h"

void set64(volatile Uint64 *x)
{
    *x = 1;
}

int threadfunc(void *arg)
{
    Uint64 x;
    set64(&x);
    return 0;
}

int main(int argc, char **argv)
{
    SDL_Thread *thr = SDL_CreateThread(threadfunc, 0, 0);
    SDL_WaitThread(thr, 0);
    return 0;
}

For simplicity, compile with UBSan to add an alignment check (otherwise set a breakpoint and inspect it yourself):

$ i686-w64-mingw32-gcc -g3 -fsanitize=undefined -fsanitize-undefined-trap-on-error example.c $(sdl2-config --cflags --libs)

The indirection through set64 is because UBSan assumes locals are aligned and so doesn't instrument them. When I run this under GDB, it traps on the *x = 1 because the Uint64 is unaligned. Since threadfunc was entered unaligned, its x is unaligned. While 64-bit integer alignment doesn't matter much here for x86, it does if SIMD is involved, including auto-vectorization, per the 12-year-old bug report.

The proper solution is to align the stack on all x86 entry points using the force_align_arg_pointer attribute. In other words, every SDL function marked __stdcall, WINAPI, or APIENTRY should have this attribute on i686-w64-mingw32. That is, put this (or an equivalent macro) in front of each definition:

#if defined(__GNUC__) && defined(__i686__)
__attribute__((force_align_arg_pointer))
#endif

Particularly:

  • ~~src/main/windows/SDL_windows_main.c on WinMain~~ (edit: msvcrt.dll currently aligns the stack)
  • src/thread/windows/SDL_systhread.c on RunThreadViaBeginThreadEx and RunThreadViaCreateThread
  • src/SDL.c on _DllMainCRTStartup (being a no-op, not actually important)

This is basically working around a very old compiler bug: GCC ought to already assume __stdcall functions won't be called with a 16-byte aligned stack on i686, because, unlike __cdecl, they rarely are.

However, this must also coincide with removing -mpreferred-stack-boundary, which effectively disables force_align_arg_pointer. Fortunately with this new fix the stack alignment override is unnecessary anyway.

This merge request is a preliminary version of the changes for SDL2. The SDL3 branch would need similar treatment.

skeeto avatar Apr 13 '23 17:04 skeeto

This seems reasonable to me, @icculus?

slouken avatar Apr 19 '23 20:04 slouken

I'm fine with this, but can we hide that __attribute__ behind a #define that we test for in one place, so we don't miss some of these in case we need to tweak them later?

icculus avatar Apr 19 '23 20:04 icculus

@icculus: Do you mean something like the following? (On top of the existing patch.)

diff --git a/src/core/windows/SDL_windows.h b/src/core/windows/SDL_windows.h
index ac6def6..3842e08 100644
--- a/src/core/windows/SDL_windows.h
+++ b/src/core/windows/SDL_windows.h
@@ -76,6 +76,19 @@
 #define WINVER       _WIN32_WINNT
 #endif
 
+/* See https://github.com/libsdl-org/SDL/pull/7607  */
+/* force_align_arg_pointer attribute requires gcc >= 4.2.x.  */
+#if defined(__clang__)
+#define HAVE_FORCE_ALIGN_ARG_POINTER
+#elif defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 2))
+#define HAVE_FORCE_ALIGN_ARG_POINTER
+#endif
+#if defined(__GNUC__) && defined(__i386__) && defined(HAVE_FORCE_ALIGN_ARG_POINTER)
+#define MINGW32_FORCEALIGN __attribute__((force_align_arg_pointer))
+#else
+#define MINGW32_FORCEALIGN
+#endif
+
 #include <windows.h>
 #include <basetyps.h> /* for REFIID with broken mingw.org headers */
 
diff --git a/src/SDL.c b/src/SDL.c
index a95d23a..ad6e08b 100644
--- a/src/SDL.c
+++ b/src/SDL.c
@@ -633,10 +633,7 @@ SDL_bool SDL_IsTablet(void)
 #if (!defined(HAVE_LIBC) || defined(__WATCOMC__)) && !defined(SDL_STATIC_LIB)
 /* Need to include DllMain() on Watcom C for some reason.. */
 
-#if defined(__GNUC__) && defined(__i686__)
-__attribute__((force_align_arg_pointer))
-#endif
-BOOL APIENTRY _DllMainCRTStartup(HANDLE hModule, DWORD ul_reason_for_call, LPVOID lpReserved)
+BOOL APIENTRY MINGW32_FORCEALIGN _DllMainCRTStartup(HANDLE hModule, DWORD ul_reason_for_call, LPVOID lpReserved)
 {
     switch (ul_reason_for_call) {
     case DLL_PROCESS_ATTACH:
diff --git a/src/main/windows/SDL_windows_main.c b/src/main/windows/SDL_windows_main.c
index 17a5df9..189b954 100644
--- a/src/main/windows/SDL_windows_main.c
+++ b/src/main/windows/SDL_windows_main.c
@@ -103,10 +103,7 @@ int console_wmain(int argc, wchar_t *wargv[], wchar_t *wenvp)
 #endif
 
 /* This is where execution begins [windowed apps] */
-#if defined(__GNUC__) && defined(__i686__)
-__attribute__((force_align_arg_pointer))
-#endif
-int WINAPI
+int WINAPI MINGW32_FORCEALIGN
 WinMain(HINSTANCE hInst, HINSTANCE hPrev, LPSTR szCmdLine, int sw) /* NOLINT(readability-inconsistent-declaration-parameter-name) */
 {
     return main_getcmdline();
diff --git a/src/thread/windows/SDL_systhread.c b/src/thread/windows/SDL_systhread.c
index ab128dc..b55eb13 100644
--- a/src/thread/windows/SDL_systhread.c
+++ b/src/thread/windows/SDL_systhread.c
@@ -55,18 +55,12 @@ static DWORD RunThread(void *data)
     return 0;
 }
 
-#if defined(__GNUC__) && defined(__i686__)
-__attribute__((force_align_arg_pointer))
-#endif
-static DWORD WINAPI RunThreadViaCreateThread(LPVOID data)
+static DWORD WINAPI MINGW32_FORCEALIGN RunThreadViaCreateThread(LPVOID data)
 {
     return RunThread(data);
 }
 
-#if defined(__GNUC__) && defined(__i686__)
-__attribute__((force_align_arg_pointer))
-#endif
-static unsigned __stdcall RunThreadViaBeginThreadEx(void *data)
+static unsigned __stdcall MINGW32_FORCEALIGN RunThreadViaBeginThreadEx(void *data)
 {
     return (unsigned)RunThread(data);
 }

Notes:

  • Is 32 bit ARM affected? If it is, defined(__i386__) should be changed to !defined(_WIN64) instead.
  • Do we need the same with clang?

sezero avatar Jul 09 '23 02:07 sezero

Do you mean something like the following?

Exactly this, yes.

I'd say go ahead and commit that and we'll add ARM32 later if necessary (but my suspicion is this is just the usual 32-bit Intel historical mistakes and everyone did the right thing on ARM by default).

icculus avatar Jul 09 '23 20:07 icculus

I rebased the patch to current SDL2 branch and applied my patch on top of it.

@skeeto: Are you good with this?

@icculus: After merge, do we want this in release-2.28.x branch?

P.S.: SDL3 will need something similar to this.

sezero avatar Jul 09 '23 23:07 sezero

After merge, do we want this in release-2.28.x branch?

I think that's a good idea.

Let's let the build checks finish just in case, then merge this, and pull this into SDL3, too.

icculus avatar Jul 09 '23 23:07 icculus

Let's let the build checks finish just in case, then merge this, and pull this into SDL3, too.

SDL3 will need porting to, at least because SDL_main is different there: I won't do that myself.

sezero avatar Jul 09 '23 23:07 sezero

Squash-merged now.

sezero avatar Jul 10 '23 00:07 sezero

Cherry-picked to release-2.28.x branch as 078e817cebec59e5c239e130223267a11cb87aee

sezero avatar Jul 10 '23 00:07 sezero