cabal icon indicating copy to clipboard operation
cabal copied to clipboard

Should Cabal build foreign source files in parallel as well?

Open TerrorJack opened this issue 4 years ago • 9 comments

Currently, Cabal does not build foreign source files (C++/C/ASM/CMM) in a parallel manner when building with -j. This can be a problem when we try to build a Haskell package which bundles a non-trivial C/C++ project. Would Cabal folks review a PR which adds parallel building for foreign sources, or is the extra complexity not worth the benefits?

TerrorJack avatar Oct 18 '20 17:10 TerrorJack

Cabal, i.e. Setup.hs interface doesn't have -j. That's the first obstacle. Recall, -j controls how many packages cabal-install builds in parallel. But when we build a package (with ./Setup interface) we are essentially single threaded.

This also applies to GHC, we don't invoke GHC with its -j ever. (User can configure ghc-options, but cabal doesn't do that).

phadej avatar Oct 18 '20 19:10 phadej

Cabal, i.e. Setup.hs interface doesn't have -j.

iirc Setup build accepts -j argument, and it eventually propagates to ghc -j as in here.

IMO the real problem is cabal-install never does Setup build -j. But still, adding parallelism support here can be useful in certain cases where the high-level build manager is not cabal-install and we can specify Setup build flags to speed up builds (e.g. setupBuildFlags in haskell.nix).

TerrorJack avatar Oct 19 '20 00:10 TerrorJack

iirc Setup build accepts -j

So seems it does.

IMO the real problem is cabal-install never does Setup build -j

Yes.

But still, adding parallelism support here can be useful in certain cases where the high-level build manager is not cabal-install

I don't disagree


Add a new flag, --c-compiler-parallelism or something in that spirit, don't reuse -j, it's confusing, and I'd rather remove it OR make it Cabal truly parallel builder where -j would control whatever one does (preprocessors, C-sources, etc.)


TL;DR, if patch isn't messy, it will be fine.

phadej avatar Oct 19 '20 00:10 phadej

For the record, I have to ask:

If you use some other high-level build manager, other than cabal-install, why don't you build C sources with something better then Cabal, something which is made for it? Cabal/GHC would only need to link that object in.

phadej avatar Oct 19 '20 00:10 phadej

If you use some other high-level build manager, other than cabal-install, why don't you build C sources with something better then Cabal, something which is made for it? Cabal/GHC would only need to link that object in.

Eh..yes, you're right! The only argument I can come up with (an admittedly weak one) is: I would need to implement a Cabal flag for that package to switch between in-tree/external C stuff, so it adds a bit of extra complexity for package authors.

Thanks for all the discussion anyway, feel free to close this issue :) If I were still to implement it, the extra parallelism would really just be a work-stealing loop replacing the original sequence_ calls, and making Cabal a truly parallel beast is way too much work and a distraction to you Cabal folks.

TerrorJack avatar Oct 19 '20 00:10 TerrorJack

I shortly look into this (because I found myself needing to compiled few dozen of C files within cabal package, what chance!)

diff --git a/Cabal/src/Distribution/Simple/GHC.hs b/Cabal/src/Distribution/Simple/GHC.hs
index ce4240d70..fa0943dde 100644
--- a/Cabal/src/Distribution/Simple/GHC.hs
+++ b/Cabal/src/Distribution/Simple/GHC.hs
@@ -522,6 +522,9 @@ buildOrReplLib mReplFlags verbosity numJobs pkg_descr lbi lib clbi = do
       platform@(Platform _hostArch hostOS) = hostPlatform lbi
       has_code = not (componentIsIndefinite clbi)
 
+  -- when flag is there but value is Nothing we should use numCpus...
+  let jobsSequence_ :: [IO a] -> IO ()
+      jobsSequence_ = maybe sequence_ (forConcurrentlyBounded_ . max 1) (join (flagToMaybe numJobs))
+
   (ghcProg, _) <- requireProgram verbosity ghcProgram (withPrograms lbi)
   let runGhcProg = runGHC verbosity ghcProg comp platform
 
@@ -679,7 +682,8 @@ buildOrReplLib mReplFlags verbosity numJobs pkg_descr lbi lib clbi = do
   -- build any C sources
   unless (not has_code || null (cSources libBi)) $ do
     info verbosity "Building C Sources..."
-    sequence_
+    putStrLn $ "NUMJOBS: " ++ show numJobs
+    jobsSequence_
       [ do let baseCcOpts    = Internal.componentCcGhcOptions verbosity implInfo
                                lbi libBi clbi libTargetDir filename
                vanillaCcOpts = if isGhcDynamic

is not big change (given forConcurrentlyBounded).

One problem is that we then invoke GHCs at the same time, their output is intermixed which is very bad. Yet, we have that problem elsewhere too (e.g. if we want to run multiple tests and stream their output), so I'll try to make a small library for it.

(Not seeing test output with cabal v2-test is a bummer in CI, looking at log files is hard).

phadej avatar Nov 02 '20 18:11 phadej

That looks nice, thank you, hope we can reap the fruits soon enough.

Here's another use-case to support OP's argument: I have a somewhat big XML file for the GB-18030 charset, which is used to autogenerate a Haskell module. The design choice and the converter isn't mine. The approach is to invoke hookedPreProcessors in a custom Setup.hs. While, on a given VM, other XML files take at most 10 seconds, that one takes over 10 minutes to generate the module. I'm looking into a way to optimize the processing code. The overhead is unreasonable and will be incurred once for module generation in the dependency graph. And Nix will do this on every pure build.

I believe processing data artifacts at compile-time by a custom Setup.hs using Haskell code is a legitimate concern, even though module generation might be a rather niche application. I would greatly appreciate a way to process them in parallel.

demming avatar Jun 10 '21 15:06 demming

I'm investigating doing this specifically with -jsem. My obstacle is that extra sources are compiled using -c instead of --make, and -c doesn't seem to care about the semaphore from -jsem. I'm looking into ways to call GHC that will fix this; maybe this will require a GHC version bump.

edmundnoble avatar Apr 07 '24 15:04 edmundnoble

This may be addressed by a combination of https://github.com/haskell/cabal/pull/9872 and https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12388. Corresponding GHC issue is at https://gitlab.haskell.org/ghc/ghc/-/issues/24642.

edmundnoble avatar Apr 10 '24 21:04 edmundnoble