hadrian
hadrian copied to clipboard
Automate `need`ing of generated files
We have a bunch of files that are generated and then included through CPP (both .c
and .hs
files). With #279 we have basic support for discovering them automatically (through GCC's -MM -MG
flags). This only works for .c
files for now.
Ideally need
ing them would be fully automatic (i.e., the build system would discover when it actually needs them). I'm opening the ticket to have a central place for discussions/ideas about this.
So I started looking at making this also work for .hs
files (since many generated files are included in Haskell sources). So far I've encountered the following problems:
-
Cabal macros are not defined (this can be hacked around by the same trick we use in
Builders/Ghc.hs
, i.e., includeautogen/cabal_macros.h
. -
MIN_VERSION_GLASGOW_HASKELL
macro is not defined. (Unless I'm missing something, this seems to be defined by GHC itself when running with-XCPP
?) -
When there's a
#
as the first non-whitespace character of a line, GCC fails (this can happen in, e.g., RULES pragma which puts#-}
on a separate file). I only found one example of this (compiler/utils/Pretty.hs
), so maybe this could be simply worked around by avoiding this. -
HsVersions.h
definesGLOBAL_VAR
macro that uses#
in its definition causing GCC complain (used for NOINLINE pragma).
@michalt Note, MIN_VERSION_GLASGOW_HASKELL
is also set by Hadrian in Rules.Generators.GhcVersionH
, so one option is to include the generated includes/ghcversion.h
in addition to the Cabal macros headers. But simply passing it via -D
is probably a cleaner way.
Regarding #
in Haskell code, I agree it looks like we can workaround this -- perhaps, just fix the file in question in GHC?
Not sure what's the best solution for GLOBAL_VAR
macros in HsVersions.h
. Can the definitions be rewritten to make GCC happy?
@snowleopard I didn't know about GhcVersionH
, thanks! I'll try include
ing it.
Found one more issue. If we run gcc -E -MM -MG
on .hs
file it doesn't treat it correctly and just reports linker input file unused because linking not done
. It seems that the extension is misleading. So I tried using -x c
(to force C) and almost everything seems to work. There seems to be a problem for things that have already been preprocessed:
user error (Development.Shake.cmd, system command failed
Command: /usr/bin/gcc -E -x c -MM -MG -fno-stack-protector -I_build/stage0/utils/genprimopcode -I_build/stage0/utils/genprimopcode/autogen -I/usr/lib64/ghc-8.0.1/base-4.9.0.0/include -I/usr/lib64/ghc-8.0.1/integer-gmp-1.0.0.1/include -I/usr/lib64/ghc-8.0.1/include -includeincludes/ghcversion.h -include_build/stage0/utils/genprimopcode/autogen/cabal_macros.h -MF /tmp/extra-file-65366212839233-5594129241884167637 _build/stage0/utils/genprimopcode/Lexer.hs
Exit code: 1
Stderr:
In file included from _build/stage0/utils/genprimopcode/Lexer.hs:126:0:
/usr/include/stdc-predef.h:256:16: error: '#' is not followed by a macro parameter
/usr/include/stdc-predef.h:257:15: error: '#' is not followed by a macro parameter
)
Looking at the Lexer.hs
, line 126:
alex_action_42 = mkTv TNoBraces
alex_action_43 = mkTv TNoBraces
{-# LINE 1 "templates/GenericTemplate.hs" #-}
{-# LINE 1 "templates/GenericTemplate.hs" #-}
{-# LINE 1 "<built-in>" #-}
{-# LINE 1 "<command-line>" #-}
{-# LINE 10 "<command-line>" #-}
# 1 "/usr/include/stdc-predef.h" 1 3 4 <--- This is line 126
# 17 "/usr/include/stdc-predef.h" 3 4
Not sure yet how to work that around.
As for the GLOBAL_VAR
I found a workaround (a bit ugly, but maybe for this case would be acceptable):
http://stackoverflow.com/questions/1386597/how-to-print-a-pound-hash-via-c-preprocessor
That being said, I think we should be careful here so that our automation with -MM -MG
doesn't introduce too many hacks or make things too fragile. After all, we don't have that many generated files and can always have a manual solution.
Looking at the Lexer.hs, line 126:
@michalt Stange. My Lexer.hs
is different, this is line 126:
{-# LINE 1 "D:\\msys\\mingw64\\lib/include\\ghcversion.h" #-}
Could it be that you've rewritten the file during experiments?
As for the GLOBAL_VAR I found a workaround
Wow, what a hack :-) It looks cryptic, but it's only one place, so maybe it's not too big deal.
By the way, does this work by any chance? I just moved the offending line up.
#define GLOBAL_VAR(name,value,ty) {-# NOINLINE name #-}; \
name :: IORef (ty); \
name = Util.global (value);
@michalt Any further updates on this?
I've just pushed quite a big refactoring/simplification, which could have messed up your uncommitted work (apologies if this is the case!). Let me know if you have any questions about the changes.
@snowleopard I haven't actually looked at this recently, sorry! I'll try to find some time (maybe this weekend) to rebase and see if we can make some progress.
@snowleopard So I have some experiments in:
- https://github.com/snowleopard/hadrian/compare/master...michalt:generated-dependencies/2
- https://github.com/ghc/ghc/compare/master...michalt:hadrian/cpp-over-hs
Notes:
- I'm only running the
needDependencies
on things that start withcompiler
(usingpackage == compiler
didn't quite work; will look into it). - For now I've introduced
FindCDependenciesOfHs
to separate the changes I need. The only real ones are 2 additional-include
parameters. - I've been able to remove a lot of stuff from
compilerDependencies
and everything seems to work (to detect what is generated and the paths, I've temporarily introducedallGeneratedDependencies
).
The above also required a few changes to GHC:
- One unsafe change of removing the
NOINLINE
pragma forGLOBAL_VAR
(maybe we could convince GHC HQ to simply inline those definitions?) - A few benign changes to comments in GHC, since
#
cannot be the first non-whitespace character in a line.
If you have a few minutes, please let me know what you think about this and if this is a reasonable direction. Thanks!
@michalt Thank you for pushing this forward. I think you are on the right track!
Your GHC changes seem fine, hopefully GLOBAL_VAR
will not stand in the way.
I'm only running the needDependencies on things that start with compiler (using package == compiler didn't quite work; will look into it).
The only difference between "compiler" isPrefixOf src
and package == compiler
is that the former doesn't include generated sources that live in _build
. Perhaps there are still some #
lurking in them?
For now I've introduced FindCDependenciesOfHs to separate the changes I need. The only real ones are 2 additional -include parameters.
:+1:
I've been able to remove a lot of stuff from compilerDependencies and everything seems to work (to detect what is generated and the paths, I've temporarily introduced allGeneratedDependencies).
If I understand correctly, allGeneratedDependencies
are only needed to implement fullPathIfGenerated
? If yes, this is great, because then allGeneratedDependencies
doesn't have to be an Expr
, and a lot of further simplifications are possible. Feel free to leave this as is, I have some ideas how to deal with this.
Your GHC changes seem fine, hopefully GLOBAL_VAR will not stand in the way.
Cool, should we ping some GHC dev about this? (to see what they think about it ;-)
If I understand correctly, allGeneratedDependencies are only needed to implement fullPathIfGenerated? If yes, this is great, because then allGeneratedDependencies doesn't have to be an Expr, and a lot of further simplifications are possible. Feel free to leave this as is, I have some ideas how to deal with this.
Yes, that's exactly right - I wanted to experiment with removing things from generatedDependencies
but I still needed fullPathIfGenerated
to work for everything.
Wrt. allGeneratedDependencies
not being an Expr
- sounds like a good idea! I'll definitely try to clean this up. :-) (maybe in a separate PR to keep things smaller, depending on how the patch looks in the end)
Cool, should we ping some GHC dev about this? (to see what they think about it ;-)
Let me try :)
@bgamari Have you got a minute to look at this diff? It's very short.
https://github.com/ghc/ghc/compare/master...michalt:hadrian/cpp-over-hs
Briefly, @michalt is trying to move around some #
s in Haskell sources because they confuse CPP and don't allow us to automatically detect all #include
s and get rid of some hard-coded dependencies.
Does this look sensible?