media-autobuild_suite icon indicating copy to clipboard operation
media-autobuild_suite copied to clipboard

Request: [clang64] LTO

Open esator opened this issue 9 months ago • 4 comments

After recent clang changes, now LTO is possible for clang with -flto=thin via custom_build_options and --enable-lto=thin for ffmpeg Also x264 needs --enable-lto since it has linking errors because by default forces -mstack-alignment=64, but for ffmpeg and other libs it's -mstack-alignment=16 It would be nice to have some option to enable lto for clang, nowadays lto is quite common and compatible, also -flto=thin is just a bit slower than normal compilation, also some libs and tools may require individual flags for lto (like -DSVT_AV1_LTO=ON for svt-av1, -DENABLE_LTO for x265, etc) At least as an experimental and unsupported option, because it might require more maintenance and have less compatibility

esator avatar May 06 '24 12:05 esator

LTO is still quite slow on Windows because malloc is just too slow, replace the malloc implementation speeds it up by an order of magnitude, but this doesn't apply to MinGW. On Linux ThinLTO is even faster than non-LTO. The biggest problem is that most libraries always build a bunch of useless shared libraries and tests executables at the same time, which is a huge waste of resources, and things like ffmpeg do a bunch of pointless build tests during configure, which is also extremely wasteful. If these factors were eliminated, ThinLTO wouldn't be that much slower.

Andarwinux avatar May 06 '24 15:05 Andarwinux

Also x264 needs --enable-lto since it has linking errors because by default forces -mstack-alignment=64, but for ffmpeg and other libs it's -mstack-alignment=16

LTO sounds great - for example I've found vstudio svt-avt binares on the web are faster than the 'optimized' -march= binaries I've compiled with llvm.

It would be helpful if you'd post a list of libs needing -mstack-alignment, or have a patch for media-suite_compile.sh - otherwise everyone has to trial & error

The biggest problem is that most libraries always build a bunch of useless shared libraries and tests executables at the same time, which is a huge waste of resources, and things like ffmpeg do a bunch of pointless build tests during configure, which is also extremely wasteful.

Compiling a full mediasuite isn't exactly fast anyway, so it could be users' decision if they want to enable lto. I don't know how much effect it would have with llvm though.

gitoss avatar Jun 17 '24 09:06 gitoss

I don't know how much effect it would have with llvm though.

see https://github.com/llvm/llvm-project/pull/91862

Andarwinux avatar Jun 17 '24 11:06 Andarwinux

I don't know how much effect it would have with llvm though.

see llvm/llvm-project#91862

Right, so it's probably good to limit lto to core encoder libs/binaries that would gain speed.

I just compiled x265 and svt-av1 with lto by adding the -C and -D args to the .sh, seems to have worked fine and didn't take ages - lucky me that I'm not using a multi-multicore cpu so the llvm malloc issue probably doesn't affect me that much.

Btw, here's a speed comparison for aom: https://www.reddit.com/r/AV1/comments/jmwepw/how_to_build_libaomav1_to_be_as_fast_as_possible/

gitoss avatar Jun 17 '24 12:06 gitoss