Add option to limit number of concurrent calls to linker when building with -j
On my eight-core machine, running cabal install -j8 renders the machine virtually unusable, presumably due to trying to run many invocations of the linker in parallel (?). It would be nice to be able to do something like cabal install -j8 --max-linkers=3 so I can compile, download, etc. up to 8 packages in parallel but only have 3 linking phases running at a time.
However, it's also possible that I have misdiagnosed the problem. The real issue, of course, is that I don't want cabal install -j8 to make my machine grind to a halt.
I'd like to first make sure that the linking is actually the problem. Why wouldn't running 8 linkers in parallel on an 8-core machine work?
Can't you just use -j7, for example?
I got this idea from the shake documentation here: http://hackage.haskell.org/package/shake-0.10.7/docs/Development-Shake.html#v:newResource . To quote, "calls to compilers are usually CPU bound but calls to linkers are usually disk bound. Running 8 linkers will often cause an 8 CPU system to grid to a halt." Though I can't say that I fully understand why running 8 disk bound processes would cause the system to grind to a halt.
Re: just using -j7, empirically anything above -j3 or -j4 is just as bad as -j8. The difference between 7 and 8 would not be that bad, but only getting to use 3 of my 8 cores when building packages makes me a sad panda.
OK, it should be possible to implement --max-linkers using a semaphore.
I will take a shot at implementing it (just wanted to get some feedback before I attempted it), and see if it helps. Is there a canonical semaphore library/abstraction that we should use nowadays?
@byorgey
I have an initial implementation of a minimal semaphore module on this branch: https://github.com/23Skidoo/cabal/commits/ghc-parmake Doesn't work on Windows yet.
Ah, cool. So I should wait for that to get merged in?
You can use System.Posix.Semaphore in the meanwhile.
Maybe the reason that running 8 linkers in parallel causes the system to crash is that linking often requires a lot of RAM when using GHC? So if that's the case (out of memory + lots of swapping?), it would be better to add an option to not start the linker if there is not much free RAM left?
Yes, that could certainly be the case. I will watch the memory usage next time to see if that's what is happening. However, I am not sure your suggestion would work very well --- it seems one could easily get in a situation where a bunch of linkers fire up all at once (because RAM usage is not too high) but then once they are running they exhaust the RAM.
Incidentally, after more experience and investigation, I'm pretty sure my inherent problem is not linkers per se but running out of memory, causing my system to start swapping. But if running the linker uses a lot of memory (?) this could still help.
@byorgey Have you tried my patches (#1572)?
@23Skidoo not yet. I'll give them a try soon.
I've got a question about the semaphore strategy --
When a worker grabs the semaphor to go into a linking phase, if there are no linker resources available, does the worker block (and thus go idle)? When blocking on one scarce resource it would be nice to continue to work on other available tasks -- compiling, building docs, etc.
But of course we don't want to accomplish that just by oversubscription (e.g. -N(4*P)), so it's nice to keep the # workers at one per core but replace them when they go down. (This is what we did in meta-par for blocking on GPU tasks to complete.)
NB: We already have plans (see https://github.com/haskell/cabal/issues/976#issuecomment-298847896) to extend the syntax of -j to something like -j n[:m] where n and m denote the cabal process parallelism and GHC's internal parallelism respectively. So we should figure out a syntax that allows us to incorporate the linker parallelism limit as well.
To add to the motivation: When a project has more than one testsuite, and one does a change to the library to a function that is not inlined and re-builds all testsuites, cabal does, in sequence: 1) recompile the changed library parts 2) start relinking all the testsuites at the same instant. In this scenario, I can deterministically observe a rather annoying burst of memory usage.
The steps of a build form a DAG, right? Is this DAG currently explicit in the cabal implementation? I had a look at #1572 and it seems like the "execution" aspects for this DAG are mixed with its construction. But perhaps the PR is outdated anyways (?)