hadrian Optimise the build

Using the Shake profiling I generated the following plot when building on an 8 Core windows machine with -j8.

The two single threaded bits of the build are the initial boot/configure (not sure there is much that can be done about that) and ghc-cabal (which is pretty awful). Anything we can do to either kick things off faster once ghc-cabal completes (why not immediately start compiling each library that we just configured?) or make ghc-cabal go in parallel?

To regenerate, just run --profile and look at the resulting profile.html.

Feb 01 '16 21:02 ndmitchell

I confirm similar performance profile on my machine.

Some comments:

I still have some hopes that it would be possible to get rid of the initial configure run, or at least make it optional/parallel. If you look into system.config.in you'll see that we can compute a large part of settings by ourselves, e.g. by using Oracles.LookupInPath. If we have paths to builders then we may be able to start building immediately, without waiting for configure to complete, so it could be run in parallel. Maybe in future we'll be able to compute all settings inside the build system. Note, I'm not talking about other configure scripts here, e.g. the one in integer-gmp seems more difficult to replace.
According to https://mail.haskell.org/pipermail/ghc-commits/2013-May/001712.html, ghc-cabal was 'simplified' some time ago and can no longer run in parallel, so I use Shake's Resource feature to limit parallelism. Without this I experienced crashes.
The profile chart also demonstrates bottlenecks due to coupled .o and .hi generation, see #174.
In many cases when we rebuild things we don't need to rerun configure scripts (under the assumption that the environment is the same). Unfortunately we don't have control over this. It would be possible to improve rebuild times after we let go of ghc-cabal, see #18.

why not immediately start compiling each library that we just configured?

I think this is what should happen already, at least I did verify this at some point in the past. Can you check? If this is no longer the case, we should be able to fix this.

Feb 01 '16 21:02 snowleopard

If we do start compiling each library after it's been configured, then we aren't tracing that, and it isn't showing up. I'll investigate that tomorrow.

As you say, configure isn't such an issue, since it changes a lot more rarely, so not as important.

Feb 01 '16 21:02 ndmitchell

So, we have two problematic executions of configure. The first one should happen only once, so not really a big issue.

The second one, attributed to ghc-cabal in the chart, is the configure of the base library. It takes a huge time and (almost) everything needs to wait for it.

Here is the chain of ghc-cabal invocations at the start of Stage1:

Run GhcCabal (stage = Stage1, package = ghc-prim) -- this is quick.
Run GhcCabal (stage = Stage1, package = integer-gmp) -- this is also quick.
Run GhcCabal (stage = Stage1, package = base) -- this takes a long time.
Run GhcCabal (stage = Stage1, package = binary) -- this is quick.
...
Run GhcCabal (stage = Stage1, package = hpc) -- this is quick.
Run GhcCabal (stage = Stage1, package = process) -- this is quick.
Compilation finally starts with ghc-prim sources.
...
Run GhcCabal (stage = Stage1, package = haskeline) -- we keep running some ghc-cabal jobs.

So, you are right: for some reason there is no compilation going on until most ghc-cabal invocations have finished. I'll try to investigate why this happens.

Feb 01 '16 22:02 snowleopard

So, the following experiment confirms that ghc-prim actually does not need anything else!

Run full build
Remove .build/stage1/libraries
Run ./build.bat .build/stage1/libraries/ghc-prim/build/libHSghc-prim-0.5.0.0.a

This executes ghc-cabal for ghc-prim, followed by compilation without involving any other packages.

Conjecture: When there are many targets in the queue, Shake might get stuck waiting for the ghc-cabal resource instead of switching to other targets that do not actually require the resource.

Feb 01 '16 22:02 snowleopard

Your conjecture would explain it - I'll triple check tomorrow, although I note that there are tests that resources release exactly like they should. Always possible you've stumbled into a corner case though.

Feb 01 '16 22:02 ndmitchell

I disproved my conjecture :) I removed resources altogether yet ghc-prim still couldn't be compiled early.

Feb 01 '16 23:02 snowleopard

Another experiment:

$ ./build.bat .build/stage1/libraries/integer-gmp/build/libHSinteger-gmp-1.0.0.0.a .build/stage1/libraries/ghc-prim/build/libHSghc-prim-0.5.0.0.a -j4

This starts to compile ghc-prim as soon as its ghc-cabal is finished, without waiting for integer-gmp.

If I swap the order of the two targets in the command line the result is the same: correct parallelisation.

With a bigger target, e.g. inplace/bin/ghc-stage2.exe things start to go wrong again.

Feb 01 '16 23:02 snowleopard

New conjecture: This can't possibly be a fault of the build system.

The above experiments show that ghc-prim does not depend on any other library and its sources can be compiled as soon as we build its package-data.mk file. The only conclusion is that Shake fails to schedule the compilation as soon as possible and instead postpones it until most package-data.mk files are built.

@ndmitchell Please prove me wrong or find a scheduling bug in Shake.

Feb 01 '16 23:02 snowleopard

I think I understand what's going on. My apologies to Shake, I was wrong in my suspicions.

My previous experiments were not entirely faithful: it is not that easy to need a target like .build/stage1/libraries/ghc-prim/build/libHSghc-prim-0.5.0.0.a. We must know the 0.5.0.0 bit, which can only be obtained after parsing package-data.mk. So, we never actually bothered to explicitly need library targets and they were discovered automatically during the build process. That's why the compilation process didn't start earlier.

I tried to add top-level need for libraries and came across the following problem. If I do roughly this:

libTargets <- fmap concat . forM libPkgs $ \pkg -> do
    let target = PartialTarget stage pkg
    compId <- interpretPartial target $ getPkgData ComponentId
    libs   <- traverse (pkgLibraryFile stage pkg compId) ways
    return libs

need libs

But then we don't get to the need until we parse all package-data.mk files, and we are back to the problem: we can't start the build process until all ghc-cabal invocations are complete. We can't simply move need into the forM because it is blocking and we will then build all libraries sequentially.

One possible way forward is not to rely on package-data.mk for getting component IDs but get them from .cabal files directly.

Feb 02 '16 11:02 snowleopard

This is a top-level target? In which case, can you do:

forM libPkgs $ \pkg -> action $ do
    ....
    need libs

Each action will run in parallel, so you should get full parallelism.

Feb 02 '16 11:02 ndmitchell

Note that the next version of Shake has a direct parallel combinator, which lets you express that inside an Action, but it sounds like its not essential for you.

Feb 02 '16 11:02 ndmitchell

Yes, this is a top-level target, however, we can't compute libPkgs outside the Action monad.

Your suggestion might still work though. We can do forM knownPackages instead and once we are inside the Action monad we can check whether a particular package (e.g., win32) needs to be built.

Feb 02 '16 11:02 snowleopard

Makes a lot of sense. If that doesn't work out, let me know and I'll see what we can do about getting you a version of parallel, which definitely will do it.

Feb 02 '16 12:02 ndmitchell

:+1: Looks like a nice use-case for parallel.

Feb 02 '16 12:02 snowleopard

It worked. With ./build.bat --configure -j4 I now get the following chart (-j8 experiment is coming). I'll push the changes once I get rid of a lint error concerning the settings file.

build-j4

Feb 02 '16 14:02 snowleopard

Nice! Can you check the second profile with -j8 into https://github.com/ndmitchell/shake-paper/tree/master/profiles, since I suspect it will want to go in the paper.

Feb 02 '16 14:02 ndmitchell

@ndmitchell I can do that, but I guess it would be better if you produce one on the same machine as before? Otherwise it may be difficult to compare.

Feb 02 '16 14:02 snowleopard

@snowleopard - True, let me know when its in.

Feb 02 '16 14:02 ndmitchell

With -j8 I got to average parallelism of 3.93.

build-j8

Feb 02 '16 15:02 snowleopard

@ndmitchell I've committed the changes, let's see how your chart will improve.

Feb 02 '16 15:02 snowleopard

On a completely fresh setup I get:

Error when running Shake build system:
* .build/stage1/rts/build/libHSrts_thr.a
* OracleQ (PackageDataKey (".build/stage1/rts/package-data.mk",".build_stage1_rts_C_SRCS"))
* .build/stage1/rts/package-data.mk
* includes/ghcautoconf.h
* mk/config.h
Error, file does not exist and no rule available:
  mk/config.h

About 10 seconds in.

Feb 02 '16 15:02 ndmitchell

@ndmitchell Aha, we also need to track mk/config.h as a file produced by ./configure. Fixed.

Feb 02 '16 15:02 snowleopard

Error when running Shake build system:
* OracleQ (ConfigKey "host-os")
* shake-build/cfg/system.config
* shake-build/cfg/system.config settings mk/config.h
* mk/config.h.in
Error, file does not exist and no rule available:
  mk/config.h.in

Also config.h.in?

Feb 02 '16 15:02 ndmitchell

Wow, apparently config.h.in is produced at the same time as configure, so I added it as another output for the rule which runs boot. Please give it a try. (There may be other unaccounted outputs.)

Feb 02 '16 15:02 snowleopard

It now works, but I only see a slight improvement.

My guess is that if base goes off first then we can't parallelise the things that come after because of the resource. Maybe we should orderOnly depend from base to all the other pieces, to ensure that base goes last?

Feb 02 '16 16:02 ndmitchell

Maybe we should orderOnly depend from base to all the other pieces, to ensure that base goes last?

I'm not sure we can do do that. We run ghc-cabal configure in the order of package dependencies. This is what make build system does and I believe I had to do it too to avoid errors. See:

-- We configure packages in the order of their dependencies
deps <- packageDeps pkg
pkgs <- interpretPartial target getPackages
let depPkgs = matchPackageNames (sort pkgs) deps
orderOnly $ map (pkgDataFile stage) depPkgs

One idea comes to mind though: maybe if we decouple ghc-cabal from ghc-pkg then we will only need to respect the ordering of the latter? This could give us a nice boost.

Note: I execute both ghc-cabal and ghc-pkg in one rule simply because this is what make does.

Feb 02 '16 16:02 snowleopard

Sounds good. I'm still skeptical about all this repeated configure. I would have hoped to have one central configure that did the minimal possible stuff (guessing the code of EPIPE seems slow and pointless), and then got reused everywhere thereafter. I don't know anywhere near enough about configure though. Black scary magic.

Feb 02 '16 16:02 ndmitchell

OK, I'll decouple ghc-cabal from ghc-pkg and we'll see what happens.

In future, I hope, we will be able to shove all configure calls into the initial --configure rule, so we will not have to run them when rebuilding, but this requires #18.

Feb 02 '16 16:02 snowleopard

I've decoupled ghc-cabal from ghc-pkg, however I don't believe we can run ghc-cabal configure on a package before we register all its dependencies, so the above idea won't work.

We need to complete both ghc-cabal from ghc-pkg for base library before other packages depending on it can be configured.

Feb 03 '16 00:02 snowleopard

The good news is: apparently after the decoupling we no longer need to limit ghc-cabal parallelism, so I dropped the resource. The base library bottleneck is still there though, as other packages are waiting of it to be registered.

@ndmitchell By the way, your recent chart is quite different from mine. Are you sure there is no mistake?

Feb 03 '16 01:02 snowleopard

@snowleopard - I'll regen from scratch tonight. Agreed, they have little in common, and if it was a mistake, that would be ideal :)

Any chance base is caching the configure output? I am running mine from a fresh directory, not just after a clean, which I observe doesn't spend nearly as much time doing the initial configure (if at all).

Feb 03 '16 06:02 ndmitchell

Any chance base is caching the configure output?

I never noticed any differences in the time it takes to run base configure. It always was unbearably long. My guess would be that configure scripts are not smart enough to do caching. But then they are black scary magic, so who knows :)

Feb 03 '16 17:02 snowleopard

I consistently get the same flat bit in the middle, which I assume is configure on base, and that it is doing caching we can't see.

Feb 04 '16 10:02 ndmitchell

@ndmitchell I would like to test this on various boxes here, but ./shake-build/build.sh -j12 --configure="--with-gmp-includes=/usr/include/gmp --with-gmp-libraries=/usr/lib --with-ar=/usr/bin/gar" --profile does not produce any profile.html anywhere. Any idea what I'm doing wrong? Thanks!

Feb 04 '16 11:02 kgardas

Can you try: ./shake-build/build.sh --no-build --profile. If I do that, I see:

$ shake-build/build.bat --no-build --profile
Warning: No want/action statements, nothing to do
Writing report to report.html     ******
shakeArgsWith                     0.000s    0%
Function shake                    0.037s    2%
Database read                     0.206s   15%  ====
With database                     0.017s    1%
Running rules                     0.001s    0%
Pool finished (1 threads, 1 max)  0.001s    0%
Lint checking                     0.001s    0%
Profile report                    1.090s   80%  ========================   ******
Total                             1.351s  100%
Build completed in 0:02m

I've put ****** on the two lines that indicate the profile is being created.

Feb 04 '16 11:02 ndmitchell

@ndmitchell Interesting, from previous run where I can't find report I see:

shakeArgsWith                          0.001s    0%                           
Function shake                         0.053s    0%                           
Database read                          0.001s    0%                           
With database                          0.000s    0%                           
Running rules                       1204.340s   99%  =========================
Pool finished (26224 threads, 12 max)  0.000s    0%                           
Lint checking                          0.107s    0%                           
Profile report                         5.732s    0%                           
Total                               1210.235s  100%                           
Build completed in 20:11m

so 5.7sec consumed on its production...

Feb 04 '16 11:02 kgardas

Then it's definitely gone somewhere! Search for report.html in the current directory and the GHC directory, and then just scan the disk...

Feb 04 '16 11:02 ndmitchell

@ndmitchell Ah, I've found the issue! In first report you write to look for profile.html but this should be report.html -- so confusion on my side. Will rerun the build as I've got rid of original report by ./shake-build/build.sh --no-build --profile run now.

Feb 04 '16 11:02 kgardas

@ndmitchell Cool! Was able to get the report.html. It looks like for -j12 we get average parallelism of 3.74. Don't know how to push graph here.

Feb 04 '16 12:02 kgardas

@kgardas Simply drag and drop in the comment window!

Feb 04 '16 12:02 snowleopard

With -j32 I get average parallelism of 3.99. The question is if this is real parallelism computed based on CPU or this is parallelism of shake invoked command. If the later, it's not that good IMHO. If the former, well, the testing is done on 6 core/12 threads xeon. Next week I'll probably fire niagara with its 32 threads usparc so I'll see how this run... report-j32

Feb 04 '16 13:02 kgardas

It's parallelism of the shake invoked command - e.g. Shake usually have 4 commands in flight. I just rebuilt with -j8 and got 4.01. I entirely agree that isn't great - for info, on my 20 core machine, with an entirely different build system, I get 16.25, and the loss of parallelism is all in the long tail. Given I only see 3 peaks that touch 8 threads for a fraction of a second, I don't expect much benefit over 8 threads.

It would be interesting to know what the average number of commands active is in the make based build system.

Feb 04 '16 13:02 ndmitchell

@ndmitchell 4.4 on GNU make build in the past. Read https://mail.haskell.org/pipermail/ghc-devs/2015-April/008752.html -- and scroll to the bottom for my original report.

Feb 04 '16 13:02 kgardas

If we get rid of configure of base being a bottleneck we end up with a parallelism of 4.4. So there still seem like a lot of choke points.

Feb 04 '16 14:02 ndmitchell

@olsner investigated parallelism of the build before. Here are some results http://lpaste.net/909516301964148736 (critical path has 245 dependencies), but maybe he can chime in himself.

Feb 04 '16 14:02 thomie

@ndmitchell IMHO big source of possible paralellisation is more clever generation of interface files. Just have and see C in RTS. Interface files (headers) are already written by hand and you can get maximum available parallel build from it. On -j32, you get 32 GNU C running IIRC. That's why I entered #174

Feb 04 '16 14:02 kgardas

BTW, here is the report from my -j64 experiment cited above using GNU make build in the past. One of the 64 commands spikes is RTS compilation. ghc-build Interesting fact is that I don't see 32 threads spike on my shake based build with -j32...

Feb 04 '16 14:02 kgardas

Add name("rts/build") && command("gcc") to the filter on the command plot page and it will show you the RTS building. I see 174 executions of gcc taking a total of 40s. They do form a spike, but it only reaches up to 4.3 parallelism, and takes only 10s start to finish. I think that's mostly a display artefact though - if I opt for 1000 buckets instead of 100 on the plot, I see:

You can increase the buckets by finding the line:

var xs = commandPlot(prepared, report.query, 100);

And changing 100 to 1000 (or more, if you so desire).

Feb 04 '16 19:02 ndmitchell

I'm not sure this is strictly necessary for the ghc-quake milestone.

May 10 '17 23:05 bgamari

Is this --profile flag still working?

May 05 '18 12:05 izgzhen

@izgzhen I think it should be working, yes. Doesn't it work for you?

May 05 '18 14:05 snowleopard

@izgzhen I think it should be working, yes. Doesn't it work for you?

Not working when I tried ... no html is generated. I'll try again

May 07 '18 01:05 izgzhen

Ah, sorry, I found it in the top-level of GHC library.

May 07 '18 01:05 izgzhen

screenshot_2018-05-07_09-33-27

BTW, this is the command plot I got

May 07 '18 01:05 izgzhen

hadrian hadrian copied to clipboard

Optimise the build

hadrian
hadrian copied to clipboard