godot
godot copied to clipboard
Use SSE 4.2 as a baseline when compiling Godot
This lets the compiler do more optimizations, leading to increased performance for demanding CPU tasks. This should be beneficial to occlusion culling rasterization, physics and more.
This change only affects x86 platforms.
This is considered a breaking change, as very old CPUs will not be able to run official Godot binaries on releases made after this is merged. On the Intel side, SSE4.2 has been supported on all Intel CPUs for a long time now (Nehalem, released in Q4 2008). However, AMD CPUs have only supported SSE4.2 since 2011 (Bulldozer, excluding APUs). It's unlikely that CPUs this old are paired with GPUs that support Vulkan or OpenGL 3.3 well anyway. This is particularly the case for old AMD CPUs which haven't aged well due to their lower single-core performance compared to Intel, particularly at the time.
This closes https://github.com/godotengine/godot-proposals/issues/3932.
elfx86exts reports for old and new release export templates:
Instructions in the binary
Current
❯ elfx86exts godot.linuxbsd.opt.64 | sort
CPU Generation: Haswell
AES (aesenc)
BMI2 (shlx)
BMI (tzcnt)
CMOV (cmovle)
MMX (movq)
MODE64 (call)
PCLMUL (pclmulqdq)
SSE1 (movups)
SSE2 (movdqu)
With the above branch
❯ elfx86exts godot.linuxbsd.opt.64.sse4.2 | sort
CPU Generation: Unknown
CMOV (cmovs)
MODE64 (ret)
SSE1 (movss)
SSE2 (pxor)
SSE3 (lddqu)
SSE41 (roundss)
SSSE3 (pshufb)
Binary sizes are almost identical, with the SSE4.2-enabled export template being 4 KB smaller when comparing the size of both binaries stripped.
Benchmark
The testing project instances 500 RigidDynamicBody3D nodes and is quit as fast as possible: test_sse4.2.zip
❯ hyperfine -iw1 "bin/godot.linuxbsd.opt.64.stripped --path ~/Documents/Godot/test_sse4.2 --quit" "bin/godot.linuxbsd.opt.64.sse4.2.stripped --path ~/Documents/Godot/test_sse4.2 --quit"
Benchmark #1: bin/godot.linuxbsd.opt.64.stripped --path ~/Documents/Godot/test_sse4.2 --quit
Time (mean ± σ): 2.394 s ± 0.282 s [User: 1.508 s, System: 0.165 s]
Range (min … max): 1.605 s … 2.546 s 10 runs
Benchmark #2: bin/godot.linuxbsd.opt.64.sse4.2.stripped --path ~/Documents/Godot/test_sse4.2 --quit
Time (mean ± σ): 2.199 s ± 0.429 s [User: 1.499 s, System: 0.169 s]
Range (min … max): 1.578 s … 2.544 s 10 runs
Summary
'bin/godot.linuxbsd.opt.64.sse4.2.stripped --path ~/Documents/Godot/test_sse4.2 --quit' ran
1.09 ± 0.25 times faster than 'bin/godot.linuxbsd.opt.64.stripped --path ~/Documents/Godot/test_sse4.2 --quit