cabal
cabal copied to clipboard
Should Cabal build foreign source files in parallel as well?
Currently, Cabal does not build foreign source files (C++/C/ASM/CMM) in a parallel manner when building with -j
. This can be a problem when we try to build a Haskell package which bundles a non-trivial C/C++ project. Would Cabal folks review a PR which adds parallel building for foreign sources, or is the extra complexity not worth the benefits?
Cabal
, i.e. Setup.hs
interface doesn't have -j
. That's the first obstacle. Recall, -j
controls how many packages cabal-install
builds in parallel. But when we build a package (with ./Setup
interface) we are essentially single threaded.
This also applies to GHC, we don't invoke GHC with its -j
ever. (User can configure ghc-options
, but cabal doesn't do that).
Cabal, i.e. Setup.hs interface doesn't have -j.
iirc Setup build
accepts -j
argument, and it eventually propagates to ghc -j
as in here.
IMO the real problem is cabal-install
never does Setup build -j
. But still, adding parallelism support here can be useful in certain cases where the high-level build manager is not cabal-install
and we can specify Setup build
flags to speed up builds (e.g. setupBuildFlags
in haskell.nix
).
iirc Setup build accepts -j
So seems it does.
IMO the real problem is cabal-install never does Setup build -j
Yes.
But still, adding parallelism support here can be useful in certain cases where the high-level build manager is not cabal-install
I don't disagree
Add a new flag, --c-compiler-parallelism
or something in that spirit, don't reuse -j
, it's confusing, and I'd rather remove it OR make it Cabal
truly parallel builder where -j
would control whatever one does (preprocessors, C-sources, etc.)
TL;DR, if patch isn't messy, it will be fine.
For the record, I have to ask:
If you use some other high-level build manager, other than cabal-install
, why don't you build C sources with something better then Cabal
, something which is made for it? Cabal/GHC would only need to link that object in.
If you use some other high-level build manager, other than cabal-install, why don't you build C sources with something better then Cabal, something which is made for it? Cabal/GHC would only need to link that object in.
Eh..yes, you're right! The only argument I can come up with (an admittedly weak one) is: I would need to implement a Cabal flag for that package to switch between in-tree/external C stuff, so it adds a bit of extra complexity for package authors.
Thanks for all the discussion anyway, feel free to close this issue :) If I were still to implement it, the extra parallelism would really just be a work-stealing loop replacing the original sequence_
calls, and making Cabal a truly parallel beast is way too much work and a distraction to you Cabal folks.
I shortly look into this (because I found myself needing to compiled few dozen of C files within cabal package, what chance!)
diff --git a/Cabal/src/Distribution/Simple/GHC.hs b/Cabal/src/Distribution/Simple/GHC.hs
index ce4240d70..fa0943dde 100644
--- a/Cabal/src/Distribution/Simple/GHC.hs
+++ b/Cabal/src/Distribution/Simple/GHC.hs
@@ -522,6 +522,9 @@ buildOrReplLib mReplFlags verbosity numJobs pkg_descr lbi lib clbi = do
platform@(Platform _hostArch hostOS) = hostPlatform lbi
has_code = not (componentIsIndefinite clbi)
+ -- when flag is there but value is Nothing we should use numCpus...
+ let jobsSequence_ :: [IO a] -> IO ()
+ jobsSequence_ = maybe sequence_ (forConcurrentlyBounded_ . max 1) (join (flagToMaybe numJobs))
+
(ghcProg, _) <- requireProgram verbosity ghcProgram (withPrograms lbi)
let runGhcProg = runGHC verbosity ghcProg comp platform
@@ -679,7 +682,8 @@ buildOrReplLib mReplFlags verbosity numJobs pkg_descr lbi lib clbi = do
-- build any C sources
unless (not has_code || null (cSources libBi)) $ do
info verbosity "Building C Sources..."
- sequence_
+ putStrLn $ "NUMJOBS: " ++ show numJobs
+ jobsSequence_
[ do let baseCcOpts = Internal.componentCcGhcOptions verbosity implInfo
lbi libBi clbi libTargetDir filename
vanillaCcOpts = if isGhcDynamic
is not big change (given forConcurrentlyBounded
).
One problem is that we then invoke GHCs at the same time, their output is intermixed which is very bad. Yet, we have that problem elsewhere too (e.g. if we want to run multiple tests and stream their output), so I'll try to make a small library for it.
(Not seeing test output with cabal v2-test
is a bummer in CI, looking at log files is hard).
That looks nice, thank you, hope we can reap the fruits soon enough.
Here's another use-case to support OP's argument: I have a somewhat big XML file for the GB-18030 charset, which is used to autogenerate a Haskell module. The design choice and the converter isn't mine. The approach is to invoke hookedPreProcessors
in a custom Setup.hs
. While, on a given VM, other XML files take at most 10 seconds, that one takes over 10 minutes to generate the module. I'm looking into a way to optimize the processing code. The overhead is unreasonable and will be incurred once for module generation in the dependency graph. And Nix will do this on every pure build.
I believe processing data artifacts at compile-time by a custom Setup.hs
using Haskell code is a legitimate concern, even though module generation might be a rather niche application. I would greatly appreciate a way to process them in parallel.
I'm investigating doing this specifically with -jsem
. My obstacle is that extra sources are compiled using -c instead of --make, and -c doesn't seem to care about the semaphore from -jsem
. I'm looking into ways to call GHC that will fix this; maybe this will require a GHC version bump.
This may be addressed by a combination of https://github.com/haskell/cabal/pull/9872 and https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12388. Corresponding GHC issue is at https://gitlab.haskell.org/ghc/ghc/-/issues/24642.