godot icon indicating copy to clipboard operation
godot copied to clipboard

Use SSE 4.2 as a baseline when compiling Godot

Open Calinou opened this issue 3 years ago • 24 comments

This lets the compiler do more optimizations, leading to increased performance for demanding CPU tasks. This should be beneficial to occlusion culling rasterization, physics and more.

This change only affects x86 platforms.

This is considered a breaking change, as very old CPUs will not be able to run official Godot binaries on releases made after this is merged. On the Intel side, SSE4.2 has been supported on all Intel CPUs for a long time now (Nehalem, released in Q4 2008). However, AMD CPUs have only supported SSE4.2 since 2011 (Bulldozer, excluding APUs). It's unlikely that CPUs this old are paired with GPUs that support Vulkan or OpenGL 3.3 well anyway. This is particularly the case for old AMD CPUs which haven't aged well due to their lower single-core performance compared to Intel, particularly at the time.

This closes https://github.com/godotengine/godot-proposals/issues/3932.

elfx86exts reports for old and new release export templates:

Instructions in the binary

Current

❯ elfx86exts godot.linuxbsd.opt.64 | sort
CPU Generation: Haswell
AES (aesenc)
BMI2 (shlx)
BMI (tzcnt)
CMOV (cmovle)
MMX (movq)
MODE64 (call)
PCLMUL (pclmulqdq)
SSE1 (movups)
SSE2 (movdqu)

With the above branch

❯ elfx86exts godot.linuxbsd.opt.64.sse4.2 | sort
CPU Generation: Unknown
CMOV (cmovs)
MODE64 (ret)
SSE1 (movss)
SSE2 (pxor)
SSE3 (lddqu)
SSE41 (roundss)
SSSE3 (pshufb)

Binary sizes are almost identical, with the SSE4.2-enabled export template being 4 KB smaller when comparing the size of both binaries stripped.

Benchmark

The testing project instances 500 RigidDynamicBody3D nodes and is quit as fast as possible: test_sse4.2.zip

❯ hyperfine -iw1 "bin/godot.linuxbsd.opt.64.stripped --path ~/Documents/Godot/test_sse4.2 --quit" "bin/godot.linuxbsd.opt.64.sse4.2.stripped --path ~/Documents/Godot/test_sse4.2 --quit"
Benchmark #1: bin/godot.linuxbsd.opt.64.stripped --path ~/Documents/Godot/test_sse4.2 --quit
  Time (mean ± σ):      2.394 s ±  0.282 s    [User: 1.508 s, System: 0.165 s]
  Range (min … max):    1.605 s …  2.546 s    10 runs

Benchmark #2: bin/godot.linuxbsd.opt.64.sse4.2.stripped --path ~/Documents/Godot/test_sse4.2 --quit
  Time (mean ± σ):      2.199 s ±  0.429 s    [User: 1.499 s, System: 0.169 s]
  Range (min … max):    1.578 s …  2.544 s    10 runs

Summary
  'bin/godot.linuxbsd.opt.64.sse4.2.stripped --path ~/Documents/Godot/test_sse4.2 --quit' ran
    1.09 ± 0.25 times faster than 'bin/godot.linuxbsd.opt.64.stripped --path ~/Documents/Godot/test_sse4.2 --quit

Calinou avatar Mar 27 '22 18:03 Calinou