SDL
SDL copied to clipboard
Unaligned stacks on i686-w64-mingw32, may lead to crashes
Affects both SDL2 and SDL3. GCC 4.5.2 and later assume a 16-byte aligned stack, but the x86 Windows stack is not necessarily so aligned. This problem was observed in #477 (2011), and the workaround was to use -mpreferred-stack-boundary
to, in essence, disable this assumption when building SDL itself — at some small performance and size cost.
However, that disabled assumption does not carry into applications running on top of SDL, particularly SDL_main
and thread entry points. The use of -mpreferred-stack-boundary
prevents most such crashes on the main thread purely by accident, which, in addition to the x64 ABI being unaffected, is probably why it's gone mostly unnoticed for the past 12 years. That is, SDL 16-byte aligns the main thread stack for applications by chance — a result of the particular choice of compiler options, GCC code generation, and that Windows enters SDL with a consistently-congruent stack pointer on the main thread.
Here's a simple SDL program that may observe unaligned locals, and does so consistently for me using the official SDL2 build:
#include "SDL.h"
void set64(volatile Uint64 *x)
{
*x = 1;
}
int threadfunc(void *arg)
{
Uint64 x;
set64(&x);
return 0;
}
int main(int argc, char **argv)
{
SDL_Thread *thr = SDL_CreateThread(threadfunc, 0, 0);
SDL_WaitThread(thr, 0);
return 0;
}
For simplicity, compile with UBSan to add an alignment check (otherwise set a breakpoint and inspect it yourself):
$ i686-w64-mingw32-gcc -g3 -fsanitize=undefined -fsanitize-undefined-trap-on-error example.c $(sdl2-config --cflags --libs)
The indirection through set64
is because UBSan assumes locals are aligned and so doesn't instrument them. When I run this under GDB, it traps on the *x = 1
because the Uint64
is unaligned. Since threadfunc
was entered unaligned, its x
is unaligned. While 64-bit integer alignment doesn't matter much here for x86, it does if SIMD is involved, including auto-vectorization, per the 12-year-old bug report.
The proper solution is to align the stack on all x86 entry points using the force_align_arg_pointer
attribute. In other words, every SDL function marked __stdcall
, WINAPI
, or APIENTRY
should have this attribute on i686-w64-mingw32. That is, put this (or an equivalent macro) in front of each definition:
#if defined(__GNUC__) && defined(__i686__)
__attribute__((force_align_arg_pointer))
#endif
Particularly:
- ~~
src/main/windows/SDL_windows_main.c
onWinMain
~~ (edit:msvcrt.dll
currently aligns the stack) -
src/thread/windows/SDL_systhread.c
onRunThreadViaBeginThreadEx
andRunThreadViaCreateThread
-
src/SDL.c
on_DllMainCRTStartup
(being a no-op, not actually important)
This is basically working around a very old compiler bug: GCC ought to already assume __stdcall
functions won't be called with a 16-byte aligned stack on i686, because, unlike __cdecl
, they rarely are.
However, this must also coincide with removing -mpreferred-stack-boundary
, which effectively disables force_align_arg_pointer
. Fortunately with this new fix the stack alignment override is unnecessary anyway.
This merge request is a preliminary version of the changes for SDL2. The SDL3 branch would need similar treatment.
This seems reasonable to me, @icculus?
I'm fine with this, but can we hide that __attribute__
behind a #define
that we test for in one place, so we don't miss some of these in case we need to tweak them later?
@icculus: Do you mean something like the following? (On top of the existing patch.)
diff --git a/src/core/windows/SDL_windows.h b/src/core/windows/SDL_windows.h
index ac6def6..3842e08 100644
--- a/src/core/windows/SDL_windows.h
+++ b/src/core/windows/SDL_windows.h
@@ -76,6 +76,19 @@
#define WINVER _WIN32_WINNT
#endif
+/* See https://github.com/libsdl-org/SDL/pull/7607 */
+/* force_align_arg_pointer attribute requires gcc >= 4.2.x. */
+#if defined(__clang__)
+#define HAVE_FORCE_ALIGN_ARG_POINTER
+#elif defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 2))
+#define HAVE_FORCE_ALIGN_ARG_POINTER
+#endif
+#if defined(__GNUC__) && defined(__i386__) && defined(HAVE_FORCE_ALIGN_ARG_POINTER)
+#define MINGW32_FORCEALIGN __attribute__((force_align_arg_pointer))
+#else
+#define MINGW32_FORCEALIGN
+#endif
+
#include <windows.h>
#include <basetyps.h> /* for REFIID with broken mingw.org headers */
diff --git a/src/SDL.c b/src/SDL.c
index a95d23a..ad6e08b 100644
--- a/src/SDL.c
+++ b/src/SDL.c
@@ -633,10 +633,7 @@ SDL_bool SDL_IsTablet(void)
#if (!defined(HAVE_LIBC) || defined(__WATCOMC__)) && !defined(SDL_STATIC_LIB)
/* Need to include DllMain() on Watcom C for some reason.. */
-#if defined(__GNUC__) && defined(__i686__)
-__attribute__((force_align_arg_pointer))
-#endif
-BOOL APIENTRY _DllMainCRTStartup(HANDLE hModule, DWORD ul_reason_for_call, LPVOID lpReserved)
+BOOL APIENTRY MINGW32_FORCEALIGN _DllMainCRTStartup(HANDLE hModule, DWORD ul_reason_for_call, LPVOID lpReserved)
{
switch (ul_reason_for_call) {
case DLL_PROCESS_ATTACH:
diff --git a/src/main/windows/SDL_windows_main.c b/src/main/windows/SDL_windows_main.c
index 17a5df9..189b954 100644
--- a/src/main/windows/SDL_windows_main.c
+++ b/src/main/windows/SDL_windows_main.c
@@ -103,10 +103,7 @@ int console_wmain(int argc, wchar_t *wargv[], wchar_t *wenvp)
#endif
/* This is where execution begins [windowed apps] */
-#if defined(__GNUC__) && defined(__i686__)
-__attribute__((force_align_arg_pointer))
-#endif
-int WINAPI
+int WINAPI MINGW32_FORCEALIGN
WinMain(HINSTANCE hInst, HINSTANCE hPrev, LPSTR szCmdLine, int sw) /* NOLINT(readability-inconsistent-declaration-parameter-name) */
{
return main_getcmdline();
diff --git a/src/thread/windows/SDL_systhread.c b/src/thread/windows/SDL_systhread.c
index ab128dc..b55eb13 100644
--- a/src/thread/windows/SDL_systhread.c
+++ b/src/thread/windows/SDL_systhread.c
@@ -55,18 +55,12 @@ static DWORD RunThread(void *data)
return 0;
}
-#if defined(__GNUC__) && defined(__i686__)
-__attribute__((force_align_arg_pointer))
-#endif
-static DWORD WINAPI RunThreadViaCreateThread(LPVOID data)
+static DWORD WINAPI MINGW32_FORCEALIGN RunThreadViaCreateThread(LPVOID data)
{
return RunThread(data);
}
-#if defined(__GNUC__) && defined(__i686__)
-__attribute__((force_align_arg_pointer))
-#endif
-static unsigned __stdcall RunThreadViaBeginThreadEx(void *data)
+static unsigned __stdcall MINGW32_FORCEALIGN RunThreadViaBeginThreadEx(void *data)
{
return (unsigned)RunThread(data);
}
Notes:
- Is 32 bit ARM affected? If it is,
defined(__i386__)
should be changed to!defined(_WIN64)
instead. - Do we need the same with clang?
Do you mean something like the following?
Exactly this, yes.
I'd say go ahead and commit that and we'll add ARM32 later if necessary (but my suspicion is this is just the usual 32-bit Intel historical mistakes and everyone did the right thing on ARM by default).
I rebased the patch to current SDL2 branch and applied my patch on top of it.
@skeeto: Are you good with this?
@icculus: After merge, do we want this in release-2.28.x branch?
P.S.: SDL3 will need something similar to this.
After merge, do we want this in release-2.28.x branch?
I think that's a good idea.
Let's let the build checks finish just in case, then merge this, and pull this into SDL3, too.
Let's let the build checks finish just in case, then merge this, and pull this into SDL3, too.
SDL3 will need porting to, at least because SDL_main is different there: I won't do that myself.
Squash-merged now.
Cherry-picked to release-2.28.x branch as 078e817cebec59e5c239e130223267a11cb87aee