hadrian
hadrian copied to clipboard
Optimise the build
Using the Shake profiling I generated the following plot when building on an 8 Core windows machine with -j8
.
The two single threaded bits of the build are the initial boot
/configure
(not sure there is much that can be done about that) and ghc-cabal
(which is pretty awful). Anything we can do to either kick things off faster once ghc-cabal
completes (why not immediately start compiling each library that we just configured?) or make ghc-cabal
go in parallel?
To regenerate, just run --profile
and look at the resulting profile.html
.
I confirm similar performance profile on my machine.
Some comments:
- I still have some hopes that it would be possible to get rid of the initial
configure
run, or at least make it optional/parallel. If you look intosystem.config.in
you'll see that we can compute a large part of settings by ourselves, e.g. by usingOracles.LookupInPath
. If we have paths to builders then we may be able to start building immediately, without waiting forconfigure
to complete, so it could be run in parallel. Maybe in future we'll be able to compute all settings inside the build system. Note, I'm not talking about other configure scripts here, e.g. the one ininteger-gmp
seems more difficult to replace. - According to https://mail.haskell.org/pipermail/ghc-commits/2013-May/001712.html,
ghc-cabal
was 'simplified' some time ago and can no longer run in parallel, so I use Shake'sResource
feature to limit parallelism. Without this I experienced crashes. - The profile chart also demonstrates bottlenecks due to coupled
.o
and.hi
generation, see #174. - In many cases when we rebuild things we don't need to rerun configure scripts (under the assumption that the environment is the same). Unfortunately we don't have control over this. It would be possible to improve rebuild times after we let go of
ghc-cabal
, see #18.
why not immediately start compiling each library that we just configured?
I think this is what should happen already, at least I did verify this at some point in the past. Can you check? If this is no longer the case, we should be able to fix this.
If we do start compiling each library after it's been configured, then we aren't tracing that, and it isn't showing up. I'll investigate that tomorrow.
As you say, configure isn't such an issue, since it changes a lot more rarely, so not as important.
So, we have two problematic executions of configure
. The first one should happen only once, so not really a big issue.
The second one, attributed to ghc-cabal
in the chart, is the configure
of the base
library. It takes a huge time and (almost) everything needs to wait for it.
Here is the chain of ghc-cabal
invocations at the start of Stage1
:
-
Run GhcCabal (stage = Stage1, package = ghc-prim)
-- this is quick. -
Run GhcCabal (stage = Stage1, package = integer-gmp)
-- this is also quick. -
Run GhcCabal (stage = Stage1, package = base)
-- this takes a long time. -
Run GhcCabal (stage = Stage1, package = binary)
-- this is quick. - ...
-
Run GhcCabal (stage = Stage1, package = hpc)
-- this is quick. -
Run GhcCabal (stage = Stage1, package = process)
-- this is quick. - Compilation finally starts with
ghc-prim
sources. - ...
-
Run GhcCabal (stage = Stage1, package = haskeline)
-- we keep running someghc-cabal
jobs.
So, you are right: for some reason there is no compilation going on until most ghc-cabal
invocations have finished. I'll try to investigate why this happens.
So, the following experiment confirms that ghc-prim
actually does not need anything else!
- Run full build
- Remove
.build/stage1/libraries
- Run
./build.bat .build/stage1/libraries/ghc-prim/build/libHSghc-prim-0.5.0.0.a
This executes ghc-cabal
for ghc-prim
, followed by compilation without involving any other packages.
Conjecture: When there are many targets in the queue, Shake might get stuck waiting for the ghc-cabal
resource instead of switching to other targets that do not actually require the resource.
Your conjecture would explain it - I'll triple check tomorrow, although I note that there are tests that resources release exactly like they should. Always possible you've stumbled into a corner case though.
I disproved my conjecture :) I removed resources altogether yet ghc-prim
still couldn't be compiled early.
Another experiment:
$ ./build.bat .build/stage1/libraries/integer-gmp/build/libHSinteger-gmp-1.0.0.0.a .build/stage1/libraries/ghc-prim/build/libHSghc-prim-0.5.0.0.a -j4
This starts to compile ghc-prim
as soon as its ghc-cabal
is finished, without waiting for integer-gmp
.
If I swap the order of the two targets in the command line the result is the same: correct parallelisation.
With a bigger target, e.g. inplace/bin/ghc-stage2.exe
things start to go wrong again.
New conjecture: This can't possibly be a fault of the build system.
The above experiments show that ghc-prim
does not depend on any other library and its sources can be compiled as soon as we build its package-data.mk
file. The only conclusion is that Shake fails to schedule the compilation as soon as possible and instead postpones it until most package-data.mk
files are built.
@ndmitchell Please prove me wrong or find a scheduling bug in Shake.
I think I understand what's going on. My apologies to Shake, I was wrong in my suspicions.
My previous experiments were not entirely faithful: it is not that easy to need
a target like .build/stage1/libraries/ghc-prim/build/libHSghc-prim-0.5.0.0.a
. We must know the 0.5.0.0
bit, which can only be obtained after parsing package-data.mk
. So, we never actually bothered to explicitly need
library targets and they were discovered automatically during the build process. That's why the compilation process didn't start earlier.
I tried to add top-level need
for libraries and came across the following problem. If I do roughly this:
libTargets <- fmap concat . forM libPkgs $ \pkg -> do
let target = PartialTarget stage pkg
compId <- interpretPartial target $ getPkgData ComponentId
libs <- traverse (pkgLibraryFile stage pkg compId) ways
return libs
need libs
But then we don't get to the need
until we parse all package-data.mk
files, and we are back to the problem: we can't start the build process until all ghc-cabal
invocations are complete. We can't simply move need
into the forM
because it is blocking and we will then build all libraries sequentially.
One possible way forward is not to rely on package-data.mk
for getting component IDs but get them from .cabal
files directly.
This is a top-level target? In which case, can you do:
forM libPkgs $ \pkg -> action $ do
....
need libs
Each action
will run in parallel, so you should get full parallelism.
Note that the next version of Shake has a direct parallel
combinator, which lets you express that inside an Action
, but it sounds like its not essential for you.
Yes, this is a top-level target, however, we can't compute libPkgs
outside the Action
monad.
Your suggestion might still work though. We can do forM knownPackages
instead and once we are inside the Action
monad we can check whether a particular package (e.g., win32
) needs to be built.
Makes a lot of sense. If that doesn't work out, let me know and I'll see what we can do about getting you a version of parallel
, which definitely will do it.
:+1: Looks like a nice use-case for parallel
.
It worked. With ./build.bat --configure -j4
I now get the following chart (-j8
experiment is coming). I'll push the changes once I get rid of a lint error concerning the settings
file.
Nice! Can you check the second profile with -j8
into https://github.com/ndmitchell/shake-paper/tree/master/profiles, since I suspect it will want to go in the paper.
@ndmitchell I can do that, but I guess it would be better if you produce one on the same machine as before? Otherwise it may be difficult to compare.
@snowleopard - True, let me know when its in.
With -j8
I got to average parallelism of 3.93
.
@ndmitchell I've committed the changes, let's see how your chart will improve.
On a completely fresh setup I get:
Error when running Shake build system:
* .build/stage1/rts/build/libHSrts_thr.a
* OracleQ (PackageDataKey (".build/stage1/rts/package-data.mk",".build_stage1_rts_C_SRCS"))
* .build/stage1/rts/package-data.mk
* includes/ghcautoconf.h
* mk/config.h
Error, file does not exist and no rule available:
mk/config.h
About 10 seconds in.
@ndmitchell Aha, we also need to track mk/config.h
as a file produced by ./configure
. Fixed.
Error when running Shake build system:
* OracleQ (ConfigKey "host-os")
* shake-build/cfg/system.config
* shake-build/cfg/system.config settings mk/config.h
* mk/config.h.in
Error, file does not exist and no rule available:
mk/config.h.in
Also config.h.in?
Wow, apparently config.h.in
is produced at the same time as configure
, so I added it as another output for the rule which runs boot
. Please give it a try. (There may be other unaccounted outputs.)
It now works, but I only see a slight improvement.
My guess is that if base goes off first then we can't parallelise the things that come after because of the resource. Maybe we should orderOnly depend from base to all the other pieces, to ensure that base goes last?
Maybe we should orderOnly depend from base to all the other pieces, to ensure that base goes last?
I'm not sure we can do do that. We run ghc-cabal configure
in the order of package dependencies. This is what make
build system does and I believe I had to do it too to avoid errors. See:
-- We configure packages in the order of their dependencies
deps <- packageDeps pkg
pkgs <- interpretPartial target getPackages
let depPkgs = matchPackageNames (sort pkgs) deps
orderOnly $ map (pkgDataFile stage) depPkgs
One idea comes to mind though: maybe if we decouple ghc-cabal
from ghc-pkg
then we will only need to respect the ordering of the latter? This could give us a nice boost.
Note: I execute both ghc-cabal
and ghc-pkg
in one rule simply because this is what make
does.
Sounds good. I'm still skeptical about all this repeated configure. I would have hoped to have one central configure that did the minimal possible stuff (guessing the code of EPIPE seems slow and pointless), and then got reused everywhere thereafter. I don't know anywhere near enough about configure though. Black scary magic.
OK, I'll decouple ghc-cabal
from ghc-pkg
and we'll see what happens.
In future, I hope, we will be able to shove all configure
calls into the initial --configure
rule, so we will not have to run them when rebuilding, but this requires #18.
I've decoupled ghc-cabal
from ghc-pkg
, however I don't believe we can run ghc-cabal configure
on a package before we register all its dependencies, so the above idea won't work.
We need to complete both ghc-cabal
from ghc-pkg
for base
library before other packages depending on it can be configured.
The good news is: apparently after the decoupling we no longer need to limit ghc-cabal
parallelism, so I dropped the resource. The base
library bottleneck is still there though, as other packages are waiting of it to be registered.
@ndmitchell By the way, your recent chart is quite different from mine. Are you sure there is no mistake?
@snowleopard - I'll regen from scratch tonight. Agreed, they have little in common, and if it was a mistake, that would be ideal :)
Any chance base is caching the configure output? I am running mine from a fresh directory, not just after a clean, which I observe doesn't spend nearly as much time doing the initial configure (if at all).
Any chance base is caching the configure output?
I never noticed any differences in the time it takes to run base
configure. It always was unbearably long. My guess would be that configure scripts are not smart enough to do caching. But then they are black scary magic, so who knows :)
I consistently get the same flat bit in the middle, which I assume is configure on base, and that it is doing caching we can't see.
@ndmitchell I would like to test this on various boxes here, but ./shake-build/build.sh -j12 --configure="--with-gmp-includes=/usr/include/gmp --with-gmp-libraries=/usr/lib --with-ar=/usr/bin/gar" --profile
does not produce any profile.html anywhere. Any idea what I'm doing wrong? Thanks!
Can you try: ./shake-build/build.sh --no-build --profile
. If I do that, I see:
$ shake-build/build.bat --no-build --profile
Warning: No want/action statements, nothing to do
Writing report to report.html ******
shakeArgsWith 0.000s 0%
Function shake 0.037s 2%
Database read 0.206s 15% ====
With database 0.017s 1%
Running rules 0.001s 0%
Pool finished (1 threads, 1 max) 0.001s 0%
Lint checking 0.001s 0%
Profile report 1.090s 80% ======================== ******
Total 1.351s 100%
Build completed in 0:02m
I've put ******
on the two lines that indicate the profile is being created.
@ndmitchell Interesting, from previous run where I can't find report I see:
shakeArgsWith 0.001s 0%
Function shake 0.053s 0%
Database read 0.001s 0%
With database 0.000s 0%
Running rules 1204.340s 99% =========================
Pool finished (26224 threads, 12 max) 0.000s 0%
Lint checking 0.107s 0%
Profile report 5.732s 0%
Total 1210.235s 100%
Build completed in 20:11m
so 5.7sec consumed on its production...
Then it's definitely gone somewhere! Search for report.html in the current directory and the GHC directory, and then just scan the disk...
@ndmitchell Ah, I've found the issue! In first report you write to look for profile.html
but this should be report.html
-- so confusion on my side. Will rerun the build as I've got rid of original report by ./shake-build/build.sh --no-build --profile
run now.
@ndmitchell Cool! Was able to get the report.html. It looks like for -j12 we get average parallelism of 3.74. Don't know how to push graph here.
@kgardas Simply drag and drop in the comment window!
With -j32 I get average parallelism of 3.99. The question is if this is real parallelism computed based on CPU or this is parallelism of shake invoked command. If the later, it's not that good IMHO. If the former, well, the testing is done on 6 core/12 threads xeon. Next week I'll probably fire niagara with its 32 threads usparc so I'll see how this run...
It's parallelism of the shake invoked command - e.g. Shake usually have 4 commands in flight. I just rebuilt with -j8 and got 4.01. I entirely agree that isn't great - for info, on my 20 core machine, with an entirely different build system, I get 16.25, and the loss of parallelism is all in the long tail. Given I only see 3 peaks that touch 8 threads for a fraction of a second, I don't expect much benefit over 8 threads.
It would be interesting to know what the average number of commands active is in the make based build system.
@ndmitchell 4.4 on GNU make build in the past. Read https://mail.haskell.org/pipermail/ghc-devs/2015-April/008752.html -- and scroll to the bottom for my original report.
If we get rid of configure of base being a bottleneck we end up with a parallelism of 4.4. So there still seem like a lot of choke points.
@olsner investigated parallelism of the build before. Here are some results http://lpaste.net/909516301964148736 (critical path has 245 dependencies), but maybe he can chime in himself.
@ndmitchell IMHO big source of possible paralellisation is more clever generation of interface files. Just have and see C in RTS. Interface files (headers) are already written by hand and you can get maximum available parallel build from it. On -j32, you get 32 GNU C running IIRC. That's why I entered #174
BTW, here is the report from my -j64 experiment cited above using GNU make build in the past. One of the 64 commands spikes is RTS compilation.
Interesting fact is that I don't see 32 threads spike on my shake based build with -j32...
Add name("rts/build") && command("gcc")
to the filter on the command plot page and it will show you the RTS building. I see 174 executions of gcc taking a total of 40s. They do form a spike, but it only reaches up to 4.3 parallelism, and takes only 10s start to finish. I think that's mostly a display artefact though - if I opt for 1000 buckets instead of 100 on the plot, I see:
You can increase the buckets by finding the line:
var xs = commandPlot(prepared, report.query, 100);
And changing 100 to 1000 (or more, if you so desire).
I'm not sure this is strictly necessary for the ghc-quake
milestone.
Is this --profile
flag still working?
@izgzhen I think it should be working, yes. Doesn't it work for you?
@izgzhen I think it should be working, yes. Doesn't it work for you?
Not working when I tried ... no html is generated. I'll try again
Ah, sorry, I found it in the top-level of GHC library.
BTW, this is the command plot I got