enable LTO for chromium
it is enabled for official builds upstream since 2016, but Gentoo does not use it, even if -flto is specified.
I've been building successfully with USE custom-cflags, which should disable the flag stripping, although I haven't verified that LTO is being enabled
Anyone willing to do a PR? I don't use Chromium myself and I remember it took forever to compile.
@InBetweenNames is there a way to confirm a binary has been LTO'd or do you look at the build logs?
I just look at the build logs, however if there are any static libraries, you can objdump them to see if there are any GIMPLE symbols in the output. Example from sys-devel/flex:
>qfile /usr/lib64/libfl.a
...
libyywrap.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .gnu.lto_.profile.2caf6e470d364625 0000000000000000 .gnu.lto_.profile.2caf6e470d364625
0000000000000000 l d .gnu.lto_.icf.2caf6e470d364625 0000000000000000 .gnu.lto_.icf.2caf6e470d364625
0000000000000000 l d .gnu.lto_.jmpfuncs.2caf6e470d364625 0000000000000000 .gnu.lto_.jmpfuncs.2caf6e470d364625
...
...
The .gnu.lto_. part is the GIMPLE from GCC. However, shared objects won't have any GIMPLE associated with them, so executables and .so files won't look any different.
but if they're unstripped and you're lucky you can find LTO function fragments.
- install sys-devel/lld
- set EXTRA_GN="thin_lto_enable_optimizations=true use_lld=true use_thin_lto=true"
alternatively, simply:
- set sys-devel/chromium USE=custom-cflags
OK, I finally built it successfully. this does work. note that it significantly increases CPU, RAM, and disk usage during build. it (very very approximately) doubles each of these.
@Hello71, were you able to just use custom-cflags or did it require more in-depth workarounds?
I didn't use custom-cflags, I used the gn options. I assume this is better supported upstream.
Ye I opened a bug report about that sever months ago because I noticed that in benchmark Chromium compiled in gentoo was slower than binary Chrome or Arch linux Chromium. The problems are that you should use latest clang, LLD and do an official_build or enable the LTO flags, also with LTO it takes much longer and you'll probably need a very strong/server machine, at least 16GB of ram.
I also tried to compile Chromium with GCC using fedora's patches but it was even slower in benchmarks.
@funghetto How much slower, and what was slower, JS? I only compile Chromium with GCC and have had no issue with performance with custom cflags.
I've used mainly Speedometer (https://browserbench.org/Speedometer2.0/) and there was like a 15%-20% performance drop there.
USE=custom-cflags compilation works for me by changing the compiler to clang, linker to lld and changing -flto=n (which isn't a valid option in clang) to -flto. Is that something to make a PR for, or do we avoid replacing gcc with clang (since I haven't seen this as something suggested anywhere nor noticed such workarounds)? @InBetweenNames
While it is not about Chromium but Firefox, there is a great blog post of a GCC developer with some interesting data points on the Clang vs. GCC debate: http://hubicka.blogspot.com/2018/12/firefox-64-built-with-gcc-and-clang.html
While it is not about Chromium but Firefox, there is a great blog post of a GCC developer with some interesting data points on the Clang vs. GCC debate: http://hubicka.blogspot.com/2018/12/firefox-64-built-with-gcc-and-clang.html
For Firefox at this point I've given up and switched to clang+lto+pgo mostly because it's more tested (being default). Firefox source code has a decent amount of ambiguous code that easily breaks with many optimizations from one version to the next and it's often subtle issues rather than a segfault, and most are long-standing known issues that aren't getting fixed anytime soon, only gcc 6 is officially supported too.
Shame the ebuild doesn't have a flag for pgo right now though, but it's just about setting MOZ_PGO=1.
Even if gcc+lto+pgo performs better, I'm not sure I want to deal with this anymore unless it gets some official support effort.
In the short term, you can use his binary instead. :)
I have uploaded a binary build with GCC 8, with link-time optimization and profile feedback. If your curiosity exceeds the fear of running random binaries from the net, you are welcome to try it out. It is built from Firefox 64 release. You can compare it to the official build and build provided by your favourite distro.
I've managed to builf Chromium with LTO
www-client/chromium-76.0.3809.100::gentoo was built with the following:
USE="cups custom-cflags jumbo-build (pic) proprietary-codecs pulseaudio suid system-ffmpeg system-icu system-libvpx tcmalloc -closure-compile -component-build -gnome-keyring -hangouts -kerberos (-selinux) -widevine" L10N="ru -am -ar -bg -bn -ca -cs -da -de -el -en-GB -es -es-419 -et -fa -fi -fil -fr -gu -he -hi -hr -hu -id -it -ja -kn -ko -lt -lv -ml -mr -ms -nb -nl -pl -pt-BR -pt-PT -ro -sk -sl -sr -sv -sw -ta -te -th -tr -uk -vi -zh-CN -zh-TW"
CFLAGS="-march=native -mtune=native -O2 -pipe -fomit-frame-pointer -fno-plt -fno-stack-protector -ftree-vectorize -s"
CXXFLAGS="-march=native -mtune=native -O2 -pipe -fomit-frame-pointer -fno-plt -fno-stack-protector -ftree-vectorize -s -Wno-pedantic -Wno-unused-result -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-deprecated-declarations -Wno-return-type -Wno-parentheses -Wno-misleading-indentation -Wno-attributes -Wno-subobject-linkage -Wno-ignored-attributes -Wno-ignored-attributes -Wno-address -Wno-dangling-else -Wno-class-memaccess -Wno-invalid-offsetof -Wno-packed-not-aligned"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs ccache config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch parallel-install pid-sandbox preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
LDFLAGS="-Wl,-O2 -Wl,--as-needed -Wl,--sort-common -Wl,--strip-debug -flto"
with clang? or gcc ? @perfect7gentleman
GCC. I can't build properly working Chromium with Clang.
good thanks @perfect7gentleman for the info
I've used some Dedian and OpenSUSE patches
EXTRA_GN="\
is_debug=false \
use_goma=false \
use_ozone=false \
use_sysroot=false \
use_openh264=false \
use_libjpeg_turbo=true \
use_custom_libcxx=false \
use_gnome_keyring=false \
use_unofficial_version_number=false \
enable_vr=false \
enable_nacl=false \
enable_nacl_nonsfi=false \
enable_swiftshader=false \
enable_reading_list=false \
enable_one_click_signin=false \
enable_iterator_debugging=false \
enable_hangout_services_extension=false \
optimize_webui=true \
treat_warnings_as_errors=false \
linux_use_bundled_binutils=false \
\
use_gio=false \
link_pulseaudio=true \
enable_widevine=false \
v8_enable_backtrace=true \
use_system_zlib=true \
use_system_lcms2=true \
use_system_libjpeg=true \
use_system_freetype=true \
use_system_harfbuzz=true \
use_system_libopenjpeg2=true \
use_jumbo_build=true \
proprietary_codecs=true \
ffmpeg_branding=\"Chrome\" \
fieldtrial_testing_like_official_build=true \
\
\
use_aura=true \
symbol_level=0 \
use_kerberos=false \
fatal_linker_warnings=false \
use_gnome_keyring=false \
use_vaapi=true \
use_dbus=true \
enable_hevc_demuxing=true \
enable_mus=true \
gcc_lto=true"
CCACHE_SLOPPINESS="time_macros"
~ $ ls /etc/portage/patches/www-client/chromium
001-libcxx.patch 013-inspector._patch_ 021-widevine-locations.patch 029-openh264.patch 036-google-api-warning.patch 044-deprecated.patch 051-null-destination.patch 058-nspr.patch 102-chromium-dma-buf.patch 125-chromium-vaapi.patch
002-parallel.patch 014-gpu-timeout.patch 023-connection-message.patch 030-chromeos.patch 037-third-party-cookies.patch 045-bool-compare.patch 052-int-in-bool-context.patch 059-zlib.patch 103-chromium-buildname.patch
003-gcc_skcms_ice.patch 015-empty-array.patch 024-unrar.patch 031-perfetto.patch 038-device-notifications.patch 046-enum-compare.patch 053-vpx.patch 060-event._patch_ 104-chromium-drm.patch
004-pffffft-buildfix.patch 016-safebrowsing.patch 025-signin.patch 032-installer.patch 040-friend.patch 047-sign-compare.patch 054-icu.patch 061-ffmpeg.patch 105-chromium-sandbox-pie.patch
009-mojo.patch 017-sequence-point.patch 026-android.patch 033-font-tests.patch 041-printf.patch 048-initialization.patch 055-gtk2.patch 062-jsoncpp._patch_ 107-chromium-system-libusb.patch
011-ps-print.patch 018-jumbo-namespace.patch 027-fuzzers.patch 034-swiftshader.patch 042-attribute.patch 049-unused-typedefs.patch 056-jpeg.patch 063-openjpeg.patch 109-gcc-lto-rsp-clobber.patch
012-as-needed.patch 019-template-export.patch 028-tracing.patch 035-welcome-page.patch 043-multichar.patch 050-unused-functions.patch 057-lcms.patch 064-convertutf.patch 110-gcc-enable-lto.patch
On the other hand, I can't compile chromium with gcc 9 and lto:
[4259/19949] rm -f obj/third_party/blink/public/mojom/libweb_feature_mojo_bindings_mojom.a && "x86_64-pc-linux-gnu-ar" -T -r -c -s -D obj/third_party/blink/public/mojom/libweb_feature_mojo_bindings_mojom.a @"obj/third_party/blink/public/mojom/libweb_feature_mojo_bindings_mojom.a.rsp"
[4260/19949] x86_64-pc-linux-gnu-g++ -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,-z,defs -Wl,--as-needed -rdynamic -pie -Wl,--disable-new-dtags -Wl,-O1 -Wl,--as-needed -O2 -march=native -falign-functions=32 -O3 -fgraphite-identity -floop-nest-optimize -fdevirtualize-at-ltrans -fipa-pta -fno-semantic-interposition -flto=1 -fuse-linker-plugin -o "./character_data_generator" -Wl,--start-group @"./character_data_generator.rsp" -Wl,--end-group -latomic -ldl -lpthread -lrt -lgmodule-2.0 -lgobject-2.0 -lgthread-2.0 -lglib-2.0 -licui18n -licuuc -licudata
FAILED: character_data_generator
x86_64-pc-linux-gnu-g++ -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,-z,defs -Wl,--as-needed -rdynamic -pie -Wl,--disable-new-dtags -Wl,-O1 -Wl,--as-needed -O2 -march=native -falign-functions=32 -O3 -fgraphite-identity -floop-nest-optimize -fdevirtualize-at-ltrans -fipa-pta -fno-semantic-interposition -flto=1 -fuse-linker-plugin -o "./character_data_generator" -Wl,--start-group @"./character_data_generator.rsp" -Wl,--end-group -latomic -ldl -lpthread -lrt -lgmodule-2.0 -lgobject-2.0 -lgthread-2.0 -lglib-2.0 -licui18n -licuuc -licudata
during IPA pass: pta
lto1: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://bugs.gentoo.org/> for instructions.
lto-wrapper: fatal error: x86_64-pc-linux-gnu-g++ returned 1 exit status
compilation terminated.
/usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
* ERROR: www-client/chromium-79.0.3945.29::gentoo failed (compile phase):
* ninja -v -j1 -l1 -C out/Release v8_context_snapshot_generator failed
the problem was actually -fipa-pta so I disabled and tried to compile without it. ICE (maybe on the same file too) in the Graphite stage. I'm now trying to build it without Graphite...
Hmm... That's interesting. I have never built Chrome(ium) though, since LibreOffice and QtWebengine take long enough already and my primary browser is Firefox Nightly.
Thing is tests are more difficult due to the big nature of this package. A single test can take 1-2 days if you compile with a low powered machine. Anyway, no graphite allowed me to get further, but then I got this error.
[7522/19949] python ../../tools/protoc_wrapper/protoc_wrapper.py crx3.proto --protoc ./protoc --proto-in-dir ../../components/crx_file --cc-out-dir gen/components/crx_file --py-out-dir pyproto/components/crx_file
FAILED: pyproto/components/crx_file/crx3_pb2.py gen/components/crx_file/crx3.pb.h gen/components/crx_file/crx3.pb.cc
python ../../tools/protoc_wrapper/protoc_wrapper.py crx3.proto --protoc ./protoc --proto-in-dir ../../components/crx_file --cc-out-dir gen/components/crx_file --py-out-dir pyproto/components/crx_file
terminate called after throwing an instance of 'std::system_error'
what(): Unknown error -1
Protoc has returned non-zero status: -6
Since this error does not give any clue on what is going wrong, I'm now trying to compile with USE="-custom-cflags".
Thing is tests are more difficult due to the big nature of this package. A single test can take 1-2 days if you compile with a low powered machine. Anyway, no graphite allowed me to get further, but then I got this error.
[7522/19949] python ../../tools/protoc_wrapper/protoc_wrapper.py crx3.proto --protoc ./protoc --proto-in-dir ../../components/crx_file --cc-out-dir gen/components/crx_file --py-out-dir pyproto/components/crx_file FAILED: pyproto/components/crx_file/crx3_pb2.py gen/components/crx_file/crx3.pb.h gen/components/crx_file/crx3.pb.cc python ../../tools/protoc_wrapper/protoc_wrapper.py crx3.proto --protoc ./protoc --proto-in-dir ../../components/crx_file --cc-out-dir gen/components/crx_file --py-out-dir pyproto/components/crx_file terminate called after throwing an instance of 'std::system_error' what(): Unknown error -1 Protoc has returned non-zero status: -6Since this error does not give any clue on what is going wrong, I'm now trying to compile with USE="-custom-cflags".
I could test it for you, it takes hour or so to build but I gave up on compiling with gcc. Chromium uses clang specific optimizations/patches which result in significant performance differences [ 20%~ ]
Chromium uses clang specific optimizations/patches which result in significant performance differences [ 20%~ ]
Can you demonstrate that? If that's true, we don't we force chromium to use clang as compiler?
Chromium uses clang specific optimizations/patches which result in significant performance differences [ 20%~ ]
Can you demonstrate that? If that's true, we don't we force chromium to use clang as compiler?
Chrome/ium uses clangs -fwhole-program-vtables for performance boost amongst other things, "Skia contains SSE and AVX optimized rendering routines which are written using Clang only vector extensions." From Honza Hubickas blog [ GCC dev ] If you need more stuff I'll link it later
As to why it isn't used? Nobody cares about compiling chromium from source with the exception of niche Gentoo user. Gentoo isn't particularly bleeding edge at its core nor it cares about chromium, clang that much. There are talks but it moves at snail's pace
I've seen some ebuilds in overlays with clang as an option but most of them disappeared recently, chromium is a fast moving target and stuff around it adapts slowly
Firefox upstream is also compiled with clang, just FYI, and clang version had double rendering performance for some time [ which was fixed/patched ]
Chrome/ium uses clangs -fwhole-program-vtables for performance boost amongst other things, "Skia contains SSE and AVX optimized rendering routines which are written using Clang only vector extensions." From Honza Hubickas blog [ GCC dev ]
After reading about this I decided to give clang another try. I am able to build chromium fine with GCC 9.2 using LTO, but when I attempt to use clang, it fails very early in the build process, and I haven't been able to figure out why. Here is the error I get:
[80/808] clang++ -pie -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,-z,defs -Wl,--as-needed -fuse-ld=lld -Wl,--icf=all -Wl,--color-diagnostics -flto=thin -Wl,--thinlto-jobs=8 -Wl,--thinlto-cache-dir=thinlto-cache -Wl,--thinlto-cache-policy,cache_size=10\%:cache_size_bytes=10g:cache_size_files=100000 -Wl,--lto-O2 -fwhole-program-vtables -rdynamic -pie -Wl,--disable-new-dtags -Wl,-O1 -Wl,--as-needed -Wl,--hash-style=gnu -Wl,-z,relro -Wl,-z,now -flto=thin -O3 -pipe -march=bdver4 -fstack-check -o "./bytecode_builtins_list_generator" -Wl,--start-group @"./bytecode_builtins_list_generator.rsp" -Wl,--end-group -latomic -ldl -lpthread -lrt
[81/808] clang++ -pie -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,-z,defs -Wl,--as-needed -fuse-ld=lld -Wl,--icf=all -Wl,--color-diagnostics -flto=thin -Wl,--thinlto-jobs=8 -Wl,--thinlto-cache-dir=thinlto-cache -Wl,--thinlto-cache-policy,cache_size=10\%:cache_size_bytes=10g:cache_size_files=100000 -Wl,--lto-O2 -fwhole-program-vtables -rdynamic -pie -Wl,--disable-new-dtags -Wl,-O1 -Wl,--as-needed -Wl,--hash-style=gnu -Wl,-z,relro -Wl,-z,now -flto=thin -O3 -pipe -march=bdver4 -fstack-check -o "./gen-regexp-special-case" -Wl,--start-group @"./gen-regexp-special-case.rsp" -Wl,--end-group -latomic -ldl -lpthread -lrt -licui18n -licuuc -licudata
[82/808] python ../../v8/tools/run.py ./gen-regexp-special-case gen/v8/src/regexp/special-case.cc
[83/808] touch obj/v8/run_gen-regexp-special-case.stamp
[84/808] python ../../v8/tools/run.py ./bytecode_builtins_list_generator gen/v8/builtins-generated/bytecodes-builtins-list.h
FAILED: gen/v8/builtins-generated/bytecodes-builtins-list.h
python ../../v8/tools/run.py ./bytecode_builtins_list_generator gen/v8/builtins-generated/bytecodes-builtins-list.h
ninja: build stopped: subcommand failed.
I tried enabling FEATURES=-fail-clean and then going into the build directory and running that command manually (python ../../v8/tools/run.py ./bytecode_builtins_list_generator gen/v8/builtins-generated/bytecodes-builtins-list.h), and it worked fine, or at least returned exit status 0 and produced a file with the expected name in the expected location. So, something is different somehow in the portage environment that is making it fail, and the error only affects running bytecode_builtins_list_generator, not building it. The step that fails doesn't actually invoke either gcc or clang, so it seems strange, since even though the binary that fails to run was compiled with clang and not gcc, it runs fine outside of the portage environment, and the only changes I made to the portage environment in between were related to the switch from gcc to clang.
The changes I made to the portage environment since successfully building with gcc are as follows:
- Enable clang, using a file in
/etc/portage/envthat looks like this:
CC="clang"
CXX="clang++"
LD="ld.lld"
AR="llvm-ar"
NM="llvm-nm"
RANLIB="llvm-ranlib"
CFLAGS="${CFLAGS} -flto=thin"
CXXFLAGS="${CXXFLAGS} -flto=thin"
LDFLAGS="${LDFLAGS} -flto=thin"
- Created
/etc/portage/env/www-client/chromiumas follows:
EXTRA_GN="thin_lto_enable_optimizations=true use_lld=true use_thin_lto=true"
- Enabled the
custom-cflagsUSE flag, although the only difference it seems to make is that it prevents-O3from being replaced by-O2. Without this USE flag, the build fails in the exact same way.