SDL_ttf icon indicating copy to clipboard operation
SDL_ttf copied to clipboard

Remove a misaligned pointer cast in BG_Blended_LCD

Open marius-jaegler opened this issue 6 months ago • 11 comments

The BG_Blended_LCD function uses Uint32 pointers to keep track of source and destination but this can cause undefined behavior if src becomes misaligned. This can happen is srcskip is not a multiple of sizeof(Uint32). This is why the function uses SDL_memcpy for reading the src buffer.

But it is actually undefined behavior to even perform the cast on line 552 when the result is misaligned. The error is caught at runtime with ubsan in debug mode (I found it compiling with zig so it can also probably be found with clang).

From the C11 standard (C11 (n1570) 6.3.2.3 p7)

A pointer to an object type may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned68) for the referenced type, the behavior is
undefined.

This patch makes the pointer a byte pointers and makes the increment sizes explicit.

There may be a similar problem with dst / dskskip but it looks like they are always aligned in the current implementation, (as long as the image pitch is). Maybe an assertion could be added for future proofing.

marius-jaegler avatar Jun 11 '25 08:06 marius-jaegler

This looks good. Is this fix needed for BG_Blended_Color() as well?

slouken avatar Jun 11 '25 18:06 slouken

Most likely yes, it is called on line 1272 with

srcskip = image_clipped.pitch - image_clipped.width;                                                  
/* ... */
srcskip -= 3 * image_clipped.width;                                                              

Unless image_clipped.width is always a multiple of 4 (I'm not familiar enough with the codebase to say), the the same fixed is needed, assuming image.pitch is always a multiple of 4.

Also there is probably the same problem in theory with the SSE intrinsics based BG_Blended_XXX functions, but those are more difficult to fix because Intel made the same mistake when they made _mm_loadu_si128 (see https://github.com/llvm/llvm-project/issues/21044) so the only compiler-idependent fix here is probably to compile with no sainitizer aand hope the compiler doesn't do anything crazy ....

For the NEON version i have no clue and I don't have an ARM computer to test, I couldn't find only what the actual alignment constraints for vld1_q_u32 are if any, (the ARM documentation uses a u32* but i is completely possible that they just kinda ignored C alignment rules like intel), the solutions i probably also just to compile without sanitizer and pray

On way to do that is to add __attribute__((no_sanitize("alignment"))) to the relevant functions, its a GCC extension and works for me (with zig/clang), but i'm not sure if all compilers support it

marius-jaegler avatar Jun 12 '25 13:06 marius-jaegler

Ill make a new pull request with these changes

marius-jaegler avatar Jun 12 '25 14:06 marius-jaegler

It might be worth seeing in what cases you get an unaligned pointer. I know the start of the images are always aligned by definition. It might be better for the SIMD functions to just fall back to non-SIMD functions in the unaligned case.

slouken avatar Jun 12 '25 14:06 slouken

The intel SIMD versions at least should work fine with a misaligned pointer at the assembly level, it's just that the intrinsic function has an "incorrect" declaration where it takes __m128i* instead of void* or char*. It can be fixed by adding an inline wrapper that disables the sanitizer just for that call

static inline __m128i __attribute__((no_sanitize("alignment"))) _mm_loadu_si128_unaligned(const void *ptr) {
    return __mm_loadu_si128(ptr);
}

After that , the fix can be applied to those function as well, the cast is still UB technically but i doubt any compiler would actually do anything unexpected here. It looks like GCC and clang now have __m128i_u* on that intrinsic for an explicitly unaligned pointers, starting from some version .... . MSVC has __unaligned, but i dont know if it is actually used for that intrinsic. The only alternative that would work with absolutely no UB for any compiler would probably be to use an inline assembly instruction directly but that would be a pain.

For the NEON versions the same method can be used but i can't really test it

marius-jaegler avatar Jun 12 '25 15:06 marius-jaegler

The other option is to just not use the SIMD versions if srcskip is misaligned, which should be pretty easy

marius-jaegler avatar Jun 12 '25 15:06 marius-jaegler

I decided to do the following:

Fix every function that is not SIMD

For SSE, use the fix with the inline wrapper for the intrinsic

For NEON, i did not touch the code because i am unable to test it

I added an option ALLOW_MISALIGNED_SIMD as a define, if it is false, rendering will fallback to the safe functions when srcskip and dstskip are not 16-byte aligned

marius-jaegler avatar Jun 12 '25 18:06 marius-jaegler

Hi,

There was so old ticket about this: https://github.com/libsdl-org/SDL_ttf/issues/324 https://github.com/libsdl-org/SDL_ttf/pull/329 They were bug fixes for sparc cpu's.

IIRC, neon allows unaligned load/store, it will only be slower.

otherwise, does it crash somewhere, or just an ubsan warning ?

1bsyl avatar Jun 14 '25 20:06 1bsyl

Wether it is a crash or just a warning depends on how ubsan is configured (it was a crash in my case), but yes, it is probably something that only sanitizers catch.

If neon allows unaligned load/store at the byte level like the SSE functions, it should be possible to apply the exact same fix to the NEON version where we wrap the incorrectly typed intrinsic into an inline function with sanitization disabled (which would really only be necessary if there exists a sanitizer that can catch it for ARM platforms)

marius-jaegler avatar Jun 15 '25 09:06 marius-jaegler

Wether it is a crash or just a warning depends on how ubsan is configured (it was a crash in my case), but yes, it is probably something that only sanitizers catch.

I mean't without ubsan. So no crash in real situation.

Though, any misaligned access should crash on sparc. So all BG_ function for lcd/blended/color, etc. without simd, neon.

If neon allows unaligned load/store at the byte level like the SSE functions, it should be possible to apply the exact same fix to the NEON version where we wrap the incorrectly typed intrinsic into an inline function with sanitization disabled (which would really only be necessary if there exists a sanitizer that can catch it for ARM platforms)

I dont get it. if new allows unaligned load/store, what do we have to fix anything ?

1bsyl avatar Jun 15 '25 12:06 1bsyl

The reason the fix is needed is that the problem is that it is not just a misaligned load/store that is undefined behavior but also the cast itself, for example with the non-SIMD version, an optimizing compiler would be allowed to vectorize the loop itself using any SIMD instructions available and it would be allowed to assume that the pointer is aligned correctly for it's type (which is not true in this case).

Or for something simpler, with the current load where tmp = *src is replaced with SDL_memcpy(&tmp, src, 4), since SDL_memcpy is actually just a define for memcpy, the compiler is allowed to see that because src is of type Uint32*, it must be aligned to 4 bytes, which means that the memcpy call can be replaced by a simple move and that would cause a crash on an architecture where misaligned loads are illegal. Although I don't know if there is currently any compiler that actually does this optimization.

For the SSE/NEON, the "fix" mostly just makes ubsan shut up while still being technically undefined behavior as far as the C standard is concerned.

marius-jaegler avatar Jun 16 '25 14:06 marius-jaegler

In case you want to tackle this issue for NEON, I can help with testing it (to be clear I'm not familiar with the stuff you are talking about but I was also bothered by the sanitizer warning). Feel free to send a message/mail my way if you want to. (Apple M1 Pro in case it's relevant).

ArnauNau avatar Jul 14 '25 16:07 ArnauNau

This wasn't much tested it seems. because it crashed rendering any emoji / with color blended.(see valgrind log). fixed in https://github.com/libsdl-org/SDL_ttf/commit/a9a4fea81b41f3ceefcfacd5b32fa3e838a554c0

there is a test program with a fuzzer/random string rendering, which should never crash. examples/testapp.c you need to add you fonts to test. F1 for help. press 'r' to run the random rendering

TTF_OpenFont: fonts/NotoColorEmoji.ttf
 29701] seed=1752609666; replay=1; font_style=1; kerning=0; sdf=0; wrap=0; wrap_size=61; w_align=2; outline=0; curr_size=6; render_mode=1; curr_str=57; curr_font=0; hinting=1; fg_alpha=0; // light Blended
==38175== Invalid write of size 4
==38175==    at 0x487CCA3: memcpy (string_fortified.h:29)
==38175==    by 0x487CCA3: BG_Blended_Color.isra.0 (SDL_ttf.c:467)
==38175==    by 0x4882F52: Render_Line_SSE_Blended_Opaque (SDL_ttf.c:1323)
==38175==    by 0x4882F52: Render_Line (SDL_ttf.c:1427)
==38175==    by 0x4883887: TTF_Render_Internal (SDL_ttf.c:3949)
==38175==    by 0x10BD07: main (testapp.c:971)
==38175==  Address 0x1e3c286c is 4 bytes after a block of size 2,394,152 alloc'd
==38175==    at 0x484A858: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==38175==    by 0x4B6E046: SDL_malloc_REAL (SDL_malloc.c:6452)
==38175==    by 0x4B6FFBF: SDL_aligned_alloc_REAL (SDL_stdlib.c:546)
==38175==    by 0x487FB01: AllocateAlignedPixels (SDL_ttf.c:1636)
==38175==    by 0x4883A1F: Create_Surface_Blended (SDL_ttf.c:1762)
==38175==    by 0x4883A1F: TTF_Render_Internal (SDL_ttf.c:3939)
==38175==    by 0x10BD07: main (testapp.c:971)

1bsyl avatar Jul 15 '25 20:07 1bsyl