SDL
SDL copied to clipboard
NEON support disabled on ARM64 macOS
(Tested on SDL3).
I noticed I was hitting the scalar fallbacks for audio type conversion on an M1 Mac, and sure enough, the NEON code is disabled.
The ARMNEON_FOUND test in CMakeLists.txt is failing, which disables NEON support. Relevant stuff from CMakeError.log is this:
Source file was:
.text
.arch armv6
.object_arch armv4
.arm
.altmacro
#ifndef __ARM_EABI__
#error EABI is required (to be sure that calling conventions are compatible)
#endif
main:
.global main
pld [r0]
uqadd8 r0, r0, r0
Performing C SOURCE FILE Test ARMNEON_FOUND failed with the following output:
Change Dir: /Users/icculus/projects/SDL-icculus/buildbot/CMakeFiles/CMakeScratch/TryCompile-DWreY8
Run Build Command(s):/usr/local/bin/ninja cmTC_cdb5a && [1/2] Building C object CMakeFiles/cmTC_cdb5a.dir/src.c.o
FAILED: CMakeFiles/cmTC_cdb5a.dir/src.c.o
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -DARMNEON_FOUND -D_GNU_SOURCE=1 -x assembler-with-cpp -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.1.sdk -mmacosx-version-min=12.6 -MD -MT CMakeFiles/cmTC_cdb5a.dir/src.c.o -MF CMakeFiles/cmTC_cdb5a.dir/src.c.o.d -o CMakeFiles/cmTC_cdb5a.dir/src.c.o -c /Users/icculus/projects/SDL-icculus/buildbot/CMakeFiles/CMakeScratch/TryCompile-DWreY8/src.c
/Users/icculus/projects/SDL-icculus/buildbot/CMakeFiles/CMakeScratch/TryCompile-DWreY8/src.c:10:10: error: EABI is required (to be sure that calling conventions are compatible)
#error EABI is required (to be sure that calling conventions are compatible)
I'm assuming __ARM_EABI__ is an ARM32 thing, but I could be wrong.
Also, we need to know if we have NEON intrinsics more than we need to know if we have NEON assembly support...should we change this to something that includes the appropriate header and tries to compile an intrinsic?
CC @madebr
That check is really only for arm 32 bit and enables the pixman asm blit code. Related issue: https://github.com/libsdl-org/SDL/issues/4484
This got me far enough to get the NEON audio converters to build and work on macOS, but tweaking this part of the CMake file is something I'd rather the experts do...notably, this should probably be split into "I have NEON intrinsics" vs "I want the NEON assembly blitters and they will build okay here."
(Also, how good are these blitters that we want to keep assembly code around in the project for them?)
diff --git a/CMakeLists.txt b/CMakeLists.txt
index d7fa95ec7..8dbf74fab 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -382,8 +382,8 @@ dep_option(SDL_SSE4_1 "Use SSE4.1 assembly routines" ON "SDL_ASSEMB
dep_option(SDL_SSE4_2 "Use SSE4.2 assembly routines" ON "SDL_ASSEMBLY;SDL_CPU_X86 OR SDL_CPU_X64" OFF)
dep_option(SDL_MMX "Use MMX assembly routines" ON "SDL_ASSEMBLY;SDL_CPU_X86 OR SDL_CPU_X64" OFF)
dep_option(SDL_ALTIVEC "Use Altivec assembly routines" ON "SDL_ASSEMBLY;SDL_CPU_POWERPC32 OR SDL_CPU_POWERPC64" OFF)
-dep_option(SDL_ARMSIMD "Use SIMD assembly blitters on ARM" OFF "SDL_ASSEMBLY;SDL_CPU_ARM32" OFF)
-dep_option(SDL_ARMNEON "Use NEON assembly blitters on ARM" OFF "SDL_ASSEMBLY;SDL_CPU_ARM32" OFF)
+dep_option(SDL_ARMSIMD "Use SIMD assembly blitters on ARM" ON "SDL_ASSEMBLY;SDL_CPU_ARM32 OR SDL_CPU_ARM64" OFF)
+dep_option(SDL_ARMNEON "Use NEON assembly blitters on ARM" ON "SDL_ASSEMBLY;SDL_CPU_ARM32 OR SDL_CPU_ARM64" OFF)
dep_option(SDL_LSX "Use LSX assembly routines" ON "SDL_ASSEMBLY;SDL_CPU_LOONGARCH64" OFF)
dep_option(SDL_LASX "Use LASX assembly routines" ON "SDL_ASSEMBLY;SDL_CPU_LOONGARCH64" OFF)
@@ -1016,34 +1016,52 @@ if(SDL_ASSEMBLY)
if(SDL_ARMNEON)
cmake_push_check_state()
- set(CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS} -x assembler-with-cpp")
- list(APPEND CMAKE_REQUIRED_LINK_OPTIONS -x none)
- check_c_source_compiles("
- .text
- .fpu neon
- .arch armv7a
- .object_arch armv4
- .eabi_attribute 10, 0
- .arm
- .altmacro
- #ifndef __ARM_EABI__
- #error EABI is required (to be sure that calling conventions are compatible)
- #endif
- main:
- .global main
- pld [r0]
- vmovn.u16 d0, q0
- " ARMNEON_FOUND)
+ if(SDL_CPU_ARM32)
+ set(CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS} -x assembler-with-cpp")
+ list(APPEND CMAKE_REQUIRED_LINK_OPTIONS -x none)
+ check_c_source_compiles("
+ .text
+ .fpu neon
+ .arch armv7a
+ .object_arch armv4
+ .eabi_attribute 10, 0
+ .arm
+ .altmacro
+ #ifndef __ARM_EABI__
+ #error EABI is required (to be sure that calling conventions are compatible)
+ #endif
+ main:
+ .global main
+ pld [r0]
+ vmovn.u16 d0, q0
+ " ARMNEON_FOUND)
+ elseif(SDL_CPU_ARM64)
+ # We currently don't have any ARM64 assembly code (AND LETS KEEP IT THAT WAY)
+ # But we need to know if we have NEON compiler intrinsics
+ check_c_source_compiles("
+ #include <arm_neon.h>
+ void floats_add(float *dest, float *a, float *b, unsigned size) {
+ for (; size >= 4; size -= 4, dest += 4, a += 4, b += 4) {
+ vst1q_f32(dest, vaddq_f32(vld1q_f32(a), vld1q_f32(b)));
+ }
+ }
+ int main(int argc, char **argv) {
+ floats_add((float*)0, (float*)0, (float*)0, 0);
+ return 0;
+ }" ARMNEON_FOUND)
+ endif()
cmake_pop_check_state()
if(ARMNEON_FOUND)
set(HAVE_ARMNEON TRUE)
- set(SDL_ARM_NEON_BLITTERS 1)
- enable_language(ASM)
- file(GLOB ARMNEON_SOURCES ${SDL3_SOURCE_DIR}/src/video/arm/pixman-arm-neon*.S)
- list(APPEND SOURCE_FILES ${ARMNEON_SOURCES})
- set_property(SOURCE ${ARMNEON_SOURCES} APPEND PROPERTY COMPILE_OPTIONS -x assembler-with-cpp)
- set(WARN_ABOUT_ARM_NEON_ASM_MIT TRUE)
+ if(SDL_CPU_ARM32)
+ set(SDL_ARM_NEON_BLITTERS 1)
+ enable_language(ASM)
+ file(GLOB ARMNEON_SOURCES ${SDL3_SOURCE_DIR}/src/video/arm/pixman-arm-neon*.S)
+ list(APPEND SOURCE_FILES ${ARMNEON_SOURCES})
+ set_property(SOURCE ${ARMNEON_SOURCES} APPEND PROPERTY COMPILE_OPTIONS -x assembler-with-cpp)
+ set(WARN_ABOUT_ARM_NEON_ASM_MIT TRUE)
+ endif()
endif()
endif()
endif()
(Also, how good are these blitters that we want to keep assembly code around in the project for them?)
Not good enough because of an incompatibility going from SDL-1.2 to SDL2 and it's disabled by default. @bavison can answer better. See https://github.com/libsdl-org/SDL/issues/4484 (original issue was https://github.com/libsdl-org/SDL-1.2/issues/777).
There are a number of benchmarks that I included with the original issue - it varies depending on the exact case, but often gave a speed-up of an order of magnitude or more. That made a big difference to some of the applications I was looking at at the time.
The problem with SDL2 is that one of the functions, BlitRGBtoRGBPixelAlpha, changed its behaviour compared to SDL1.2, so my assembly routine (which was originally written for SDL1.2) no longer corresponded to it. The operation the SDL2 version performs is technically wrong - it corresponds to neither the formally-defined "over" compositing blend for straight alpha nor premultiplied alpha - and it seems unlikely to me that it corresponds to any hardware-accelerated blend operation either.
It makes little sense to me to put effort into accelerating an incorrect function, since that work would have to be redone at a later date when the function is corrected. I tried to start a debate about what the behaviour of the function should be, but nobody else engaged.
If you're most concerned about speed over precision, I'd recommend the use of premultiplied alpha, without gamma correction. One nice thing about this is accelerated versions of this operation for most common architectures can be lifted straight from other libraries, including Pixman. However, it might necessitate adding a new function to convert legacy images from straight alpha to premultiplied alpha
If accuracy and precision is more important, you probably want to do the blending in linear intensity space, which probably requires 16-bit intermediate values to avoid losing precision after undoing gamma correction and multiplying by alpha. You could also continue to use straight alpha for the output images for best precision, but that's a particularly expensive choice because of it requiring 3 divisions per pixel.
Yes, I think we concluded that SDL was wrong in this case, but didn't want to change it because we didn't know what applications might be relying on this behavior. SDL3 is the right time to revisit this, since we can change behavior if needed.
@sezero, should we close this, or is the NEON code coming back?
I think we should close