haskell.nix icon indicating copy to clipboard operation
haskell.nix copied to clipboard

Building with nix and haskell.nix is 3-4x slower than with stack

Open purefn opened this issue 4 years ago • 15 comments

I am working with a rather large Haskell codebase. I've been working on getting it to build with nix for a few weeks. At first I tried with the Haskell infrastructure built into nixpkgs, but the build times were so bad that I had to abandon it. I decided to try out haskell.nix because it addressed some of the problems with nixpkgs, mainly the serialized library, exe, test problem. That did significantly reduce the amount of time it took to build our project, but it seems it's not enough.

With stack we're seeing approximately 20 minute build times from a clean state. With nix and haskell.nix we're seeing around 60 minute build times. That's down from what it was because I forked haskell.nix and switched over to using ld.gold for linking instead of ld.. I also made sure we're using a nixpkgs revision that includes the recent speed up improvements from avoiding subshells.

We have about 452 packages in total that get built on a clean build. Thanks to nix caching, only about half of them get built during many builds. But those builds still take around 30 minutes to complete. Is there any other low hanging fruit (other than refactoring our package structure so fewer things need to be built in an "average" build)? I've noticed that ghc-pkg runs frequently and seems to take more time than during a stack build (though I lack the numbers to back that up), is there any way of improving that?

purefn avatar Nov 15 '19 16:11 purefn

The excessive ghc-pkg invocations would be another item to look into. Ideally we’d cache the unit-ids in nix and wouldn't have to call ghc-pkg that often.

It would certainly be great if we could produce some flame graph from a nix build. And maybe aggregate the time spent of actual ghc invocations and the time in between. Better numbers would certainly help. Being three times as slow would indicate that we spend 2/3rds if the time not actually compiling anything.

angerman avatar Nov 16 '19 03:11 angerman

I've been looking at the builder/make-config-files.nix file and wondering about speeding it up a bit. If you can elaborate on some ideas I can do a bit more research into it and see if I can find an approach that works. One thing I was thinking was that rather than using ghc-pkg register for dependencies we could just link to the libraries package configurations and then do a single ghc-pkg recache, but I'm not sure if 1) it would help, or 2) it would even work properly.

I'd be happy to take a look into producing flame graphs. Any ideas on what tools to use to do that?

purefn avatar Nov 16 '19 17:11 purefn

@purefn I'm afraid I don't know how about profiling tools :-/

We basically call ghc-pkg a lot in this:

https://github.com/input-output-hk/haskell.nix/blob/5fa379ebc75c6f960d9f03aa857c0e0350e4dbe1/builder/make-config-files.nix#L9-L36

thus if we could stick the id and name somehow into the pkg's derivation, we wouldn't need to look it up each time.

On the other hand, if we patch cabal to just accept a package database, and not run the solver without requireing to pass --exact-configuration and all the other constraints, that would make it much easier as well.

Thus an optimization in nix, might be to stick the id and name somehow into the compuited derivations (e.g. passthru?). But an even better solution (that would also allow us to throw out large chunks of nix code) would be to teach cabal to not run the solver and accept a package db that contains exactly one package for each dependency.

Then again, I don't know how much time we spend in each phase. So maybe a good first step would be to have some trace marks in the shell output. I guess we could use time on the setup build invocations and tally the time outputs from the nix-build log up, comparing them to the total time we spend on nix-build. That should at least give us some idea of how long we spent on just compiling.

angerman avatar Nov 18 '19 02:11 angerman

I had a similar thought about using time. To start with I wrapped the runCommand script that builds those config file with time and then timed a build, after having set my nix max jobs configuration value to 1. For my project, a complete build of exes and tests took 149m50.027s.

I tallied up the times that the make config files script ran. It took a total of 87m636s of that build time. Eliminating most of that time would bring the build time down to 2x what it takes with stack, so I definitely think this is a low hanging fruit that should be explored.

I'll see if I can get some time to look into caching more of those id and name values.

purefn avatar Nov 18 '19 16:11 purefn

So we know that we spend about 90/150 or 3/5 in https://github.com/input-output-hk/haskell.nix/blob/5fa379ebc75c6f960d9f03aa857c0e0350e4dbe1/builder/make-config-files.nix#L46

That seems quite extreme! I believe it would be quite helpful to know how much time we spend in the first half: https://github.com/input-output-hk/haskell.nix/blob/5fa379ebc75c6f960d9f03aa857c0e0350e4dbe1/builder/make-config-files.nix#L47-L77

and in the second half: https://github.com/input-output-hk/haskell.nix/blob/5fa379ebc75c6f960d9f03aa857c0e0350e4dbe1/builder/make-config-files.nix#L78-L130

If the second half really dominates this, then making the exactDep logic faster should help a lot. After talking to @phadej about this, he recommended using a haskell script instead.

We've essentially put all the libraries we want into $out/package.conf.d register. So we could probably just load that package database and dump out all the information in the format we need it to for configure-flags file.

The script @phadej proposed, and which just dumps the information, but doesn't do th enecessary formatting yet is:

module Main (main) where

import Distribution.InstalledPackageInfo (InstalledPackageInfo (..))
import Distribution.Pretty (prettyShow)
import Distribution.Simple.Compiler (PackageDB (..))
import Distribution.Simple.Program.HcPkg (dump, HcPkgInfo (..))
import Distribution.Simple.Program.Types (ConfiguredProgram, simpleConfiguredProgram, ProgramLocation (..))
import Distribution.Verbosity (silent)
import System.Environment (getArgs)

main :: IO ()
main = do
    (ghcpkg : arg : _) <- getArgs

    let cp :: ConfiguredProgram
        cp = simpleConfiguredProgram "ghc-pkg" (FoundOnSystem ghcpkg)

    let hcPkgInfo = HcPkgInfo cp
           -- this boolean flags are irrelevant for dump
           False False False False False False False False
                       -- ^ -- package-conf toggler for very old GHCs
    ipis <- dump hcPkgInfo silent (SpecificPackageDB arg)
    mapM_ (putStrLn . prettyShow . installedUnitId) ipis

to be invoked like

runghc GhcPkgScript.hs ghc-pkg /path/to/package.db

angerman avatar Nov 19 '19 05:11 angerman

That being said, if we instead spend a massive amount of time in the first half, we may need to think about constructing the package's package-db more intelligently :-/

angerman avatar Nov 19 '19 05:11 angerman

I have some local changes that seem promising. I first worked on the bit generating the cabal.config, configure-flags, and ghc-environment. The change I made was to basically create those files for each package built as part of the install phase, then in packages that depend on those just cat ${p}/nix/configure-flags >> $out/configure-flags, and so on.

I didn't see much improvement from doing that. There is a little, but it isn't hugely significant. So I moved onto improving how the package db was built. I ended up just doing the same thing the builtin nixpkgs Haskell infrastructure does and copying the ${p}/package.conf.d/*.conf files of dependencies, then doing a single ghc-pkg recache at the end. I also removed the ghc-pkg check at the end.

The combination of these things seems to have improved the times quite a bit. I'm still in the middle of a build, but I can say that were I was seeing some packages that took more than a minute in the make-config-files part, they now only take a few seconds. Most packages spend around half a second in this part, which I think is more reasonable.

I'll post back later with numbers for a better comparison.

purefn avatar Nov 19 '19 16:11 purefn

ghc-pkg recache is how cabal-install registers stuff into its store. I won't be surprised if performance is one of the motivations (another one, is that it allows registering packages with same pkg-id but different unit-ids there).

phadej avatar Nov 19 '19 17:11 phadej

With the changes I mentioned above the total build time was reduced to 65m40.419s and the time that the make-config-files takes was reduced to 4m49.644s.

These numbers are much more reasonable! I want to clean a few things up and then I'll open up a PR.

purefn avatar Nov 19 '19 19:11 purefn

@purefn so we went from ~150min to ~65min for single core compilation? Impressive! How much faster is stack with single core compilation?

angerman avatar Nov 19 '19 23:11 angerman

It looks like many of the changes in here have been merged?

Should this issue be closed then maybe?

If not, I'll add in a data point of stack taking 4 minutes and haskell.nix taking 12 minutes with a recent commit.

I'd be happy to debug, profile, and hack at this performance issue further but I'd need some direction on how to do that.

codygman avatar Jul 29 '20 00:07 codygman

@codygman is this with a fresh stack build or with a warm cache? Taking 3x as much time would indicate to me that haskell.nix is building something that stack isn't? I somehow doubt it's some parallelism that stack manages to exploit?

angerman avatar Jul 31 '20 15:07 angerman

@angerman I did a stack clean before-hand, I can do a stack clean --full when I'm back at the appropriate computer to be sure.

codygman avatar Aug 04 '20 18:08 codygman

@codygman that would be helpful. I'm not that familiar with stack; nix will however start from scratch each time to force reproducibility, except for cached libraries. If we build the same and haskell.nix manages to consume 3x more time something seems really off, and we should investigate this!

angerman avatar Aug 05 '20 04:08 angerman

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 28 '22 21:09 stale[bot]