mold icon indicating copy to clipboard operation
mold copied to clipboard

Consider doing bootstrap (PGO bootstrap)

Open marxin opened this issue 4 years ago • 21 comments

Similar to GCC, mold can easily bootstrap (link mold using already built mold). Plus you can squeeze some extra performance from PGO (-fprofile-generate and -fprofile-use), where linking of mold can be used as a training run. Note PGO plays very well with LTO. What do you think?

marxin avatar Jan 05 '22 10:01 marxin

That's an interesting idea. I'm also genuinely interested in how much PGO can improve our linker's performance. I'll experiment it a bit and update this bug later. Thanks!

rui314 avatar Jan 05 '22 12:01 rui314

I wrote this shell script to link mold with PGO, using mold itself as training data. For some reason, the resulting PGO-enabled mold is slower than non-PGO build by ~10% when building Chrome. This is odd...

#!/bin/bash
set -e

mkdir -p pgo

make clean
make -j EXTRA_CXXFLAGS='-fprofile-instr-generate -O2'
mv mold pgo/mold-stage1
make clean
make -j
rm mold

LLVM_PROFILE_FILE=pgo/a.profraw \
  make -j EXTRA_LDFLAGS='-fuse-ld=`pwd`/pgo/mold-stage1 -Wl,-no-quick-exit'

llvm-profdata merge -output=pgo/a.profdata pgo/a.profraw
make clean

make -j EXTRA_CXXFLAGS='-flto -O2 -fprofile-instr-use=pgo/a.profdata' \
  EXTRA_LDFLAGS='-flto -O2 -fprofile-instr-use=pgo/a.profdata'

rui314 avatar Jan 05 '22 12:01 rui314

Can you please test also GCC-built mold?

marxin avatar Jan 05 '22 13:01 marxin

Can you please test also GCC-built mold?

I did built it, at arch. But in general most gcc-git builds seems for me completely broken after installation. Maybe its a upstream bug or a bug from the used PKBUILD. Already used around 3-5 different PKGBUILD's for gcc-git, everytime the same error. Even a kernel is not possible to compile after the first seconds.

Will external compiler and not for the host the compiler and i see.

Possible to provide backport patches ?

Regards.

ptr1337 avatar Jan 05 '22 14:01 ptr1337

I did built it, at arch. But in general most gcc-git builds seems for me completely broken after installation. Maybe its a upstream bug or a bug from the used PKBUILD. Already used around 3-5 different PKGBUILD's for gcc-git, everytime the same error. Even a kernel is not possible to compile after the first seconds.

You can use the latest stable release 11.2.0. Yes, Linux kernel built is broken with the current master due to: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101941

But that should not block building mold.

marxin avatar Jan 05 '22 14:01 marxin

@marxin

I mean to "backport" the patches, which are implemented into the https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/devel/mold-lto-plugin tree. Or are only the two commits from you, releated to mold ?

Currently im running GCC, with the mold patch from 12.0. I just mean, the new implementation's you give for gcc-git.

--- got your last patch into gcc 11, everything good!

ptr1337 avatar Jan 05 '22 14:01 ptr1337

I mean to "backport" the patches, which are implemented into the https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/devel/mold-lto-plugin tree. Or are only the two commits from you, releated to mold ?

It's only a single patch. Well, I would recommend building GCC from the source for LTO plug-in integration testing.

marxin avatar Jan 05 '22 14:01 marxin

I mean to "backport" the patches, which are implemented into the https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/devel/mold-lto-plugin tree. Or are only the two commits from you, releated to mold ?

It's only a single patch. Well, I would recommend building GCC from the source for LTO plug-in integration testing.

Yes, i just going to compile my host compiler fast with the two patches and then the one from the mold lto plugin tree, as external compiler-

ptr1337 avatar Jan 05 '22 14:01 ptr1337

I just came over this thread, and thought about giving bolt a try to optimize mold. I dont know much about this could help in the performance but could be a try worth ? What do you think?

ptr1337 avatar May 11 '22 03:05 ptr1337

I don't know if we can observe a noticeable difference, but It's worth a try.

rui314 avatar May 11 '22 06:05 rui314

I did tried to use the instrument mode from llvm-bolt, without success so far since it seems it does not call directly the "mold" binary which I used for instrumenting.

So i get currently simply no profiles for it. I will test it on a intel machine. But how to check which binary is faster? Do you have a benchmark or something similar ?

ptr1337 avatar Jun 15 '22 12:06 ptr1337

I think if you are going to use the profile of linking mold itself as training data, the obvious benchmark to test its profile-guided optimization is to link mold itself. It may overfit though.

If mold is too small to use as training data, you should use something larger (e.g. LLVM), and you can link the same program again with a profile-guided-optimized mold to see whether PGO works or not.

rui314 avatar Jun 15 '22 12:06 rui314

No, llvm-bolt has a function which feeds the binary with debug data (it grows alot in size). The binary needs to be build with relocations.

If you then run a workload with the instrumented binary you'll get profile which can be used from llvm-bolt to optimize the binary.

ptr1337 avatar Jun 15 '22 12:06 ptr1337

Sorry, what was your problem? I have no experience of using PGO nor BOLT before, so maybe I cannot help you that much.

rui314 avatar Jun 15 '22 12:06 rui314

Sorry, what was your problem? I have no experience of using PGO nor BOLT before, so maybe I cannot help you that much.

Everything good. I will checkout if it works with sampling a profile on a intel machine when i have access to it. That should probably work.

ptr1337 avatar Jun 15 '22 12:06 ptr1337

I wrote this shell script to link mold with PGO, using mold itself as training data. For some reason, the resulting PGO-enabled mold is slower than non-PGO build by ~10% when building Chrome. This is odd...

@rui314 was you able to find the reason, why mold with PGO was slower than mold without PGO?

zamazan4ik avatar Dec 26 '22 22:12 zamazan4ik

I have no idea. Can you reproduce the result? If so, we want to ask PGO developers why.

rui314 avatar Dec 26 '22 23:12 rui314

I have no idea. Can you reproduce the result? If so, we want to ask PGO developers why.

I can try. The only question is: could I reproduce it using my local Apple Macbook M1 (ARM-based) since you removed the macOS support and moved it to sold?

zamazan4ik avatar Dec 27 '22 00:12 zamazan4ik

macOS support is experimental anyway, so testing PGO with it doesn't make much sense at this moment.

rui314 avatar Dec 27 '22 01:12 rui314

Tested PGO vs non-PGO mold on linking Clang with ThinLTO - no difference between versions. Also, I tested BOLTed mold vs usual mold - still no measurable effect on linking Clang. Maybe on larger things like Chromium or ClickHouse will be a measurable result...

zamazan4ik avatar Jun 27 '23 15:06 zamazan4ik