hadrian icon indicating copy to clipboard operation
hadrian copied to clipboard

Automate `need`ing of generated files

Open michalt opened this issue 8 years ago • 10 comments

We have a bunch of files that are generated and then included through CPP (both .c and .hs files). With #279 we have basic support for discovering them automatically (through GCC's -MM -MG flags). This only works for .c files for now. Ideally needing them would be fully automatic (i.e., the build system would discover when it actually needs them). I'm opening the ticket to have a central place for discussions/ideas about this.

michalt avatar Aug 21 '16 13:08 michalt

So I started looking at making this also work for .hs files (since many generated files are included in Haskell sources). So far I've encountered the following problems:

  1. Cabal macros are not defined (this can be hacked around by the same trick we use in Builders/Ghc.hs, i.e., include autogen/cabal_macros.h.

  2. MIN_VERSION_GLASGOW_HASKELL macro is not defined. (Unless I'm missing something, this seems to be defined by GHC itself when running with -XCPP?)

  3. When there's a # as the first non-whitespace character of a line, GCC fails (this can happen in, e.g., RULES pragma which puts #-} on a separate file). I only found one example of this (compiler/utils/Pretty.hs), so maybe this could be simply worked around by avoiding this.

  4. HsVersions.h defines GLOBAL_VAR macro that uses # in its definition causing GCC complain (used for NOINLINE pragma).

michalt avatar Aug 21 '16 14:08 michalt

@michalt Note, MIN_VERSION_GLASGOW_HASKELL is also set by Hadrian in Rules.Generators.GhcVersionH, so one option is to include the generated includes/ghcversion.h in addition to the Cabal macros headers. But simply passing it via -D is probably a cleaner way.

Regarding # in Haskell code, I agree it looks like we can workaround this -- perhaps, just fix the file in question in GHC?

Not sure what's the best solution for GLOBAL_VAR macros in HsVersions.h. Can the definitions be rewritten to make GCC happy?

snowleopard avatar Aug 21 '16 16:08 snowleopard

@snowleopard I didn't know about GhcVersionH, thanks! I'll try includeing it.

Found one more issue. If we run gcc -E -MM -MG on .hs file it doesn't treat it correctly and just reports linker input file unused because linking not done. It seems that the extension is misleading. So I tried using -x c (to force C) and almost everything seems to work. There seems to be a problem for things that have already been preprocessed:

user error (Development.Shake.cmd, system command failed
Command: /usr/bin/gcc -E -x c -MM -MG -fno-stack-protector -I_build/stage0/utils/genprimopcode -I_build/stage0/utils/genprimopcode/autogen -I/usr/lib64/ghc-8.0.1/base-4.9.0.0/include -I/usr/lib64/ghc-8.0.1/integer-gmp-1.0.0.1/include -I/usr/lib64/ghc-8.0.1/include -includeincludes/ghcversion.h -include_build/stage0/utils/genprimopcode/autogen/cabal_macros.h -MF /tmp/extra-file-65366212839233-5594129241884167637 _build/stage0/utils/genprimopcode/Lexer.hs
Exit code: 1
Stderr:
In file included from _build/stage0/utils/genprimopcode/Lexer.hs:126:0:
/usr/include/stdc-predef.h:256:16: error: '#' is not followed by a macro parameter
/usr/include/stdc-predef.h:257:15: error: '#' is not followed by a macro parameter
)

Looking at the Lexer.hs, line 126:

alex_action_42 =  mkTv TNoBraces 
alex_action_43 =  mkTv TNoBraces 
{-# LINE 1 "templates/GenericTemplate.hs" #-}
{-# LINE 1 "templates/GenericTemplate.hs" #-}
{-# LINE 1 "<built-in>" #-}
{-# LINE 1 "<command-line>" #-}
{-# LINE 10 "<command-line>" #-}
# 1 "/usr/include/stdc-predef.h" 1 3 4                    <--- This is line 126

# 17 "/usr/include/stdc-predef.h" 3 4

Not sure yet how to work that around.

As for the GLOBAL_VAR I found a workaround (a bit ugly, but maybe for this case would be acceptable): http://stackoverflow.com/questions/1386597/how-to-print-a-pound-hash-via-c-preprocessor

That being said, I think we should be careful here so that our automation with -MM -MG doesn't introduce too many hacks or make things too fragile. After all, we don't have that many generated files and can always have a manual solution.

michalt avatar Aug 24 '16 18:08 michalt

Looking at the Lexer.hs, line 126:

@michalt Stange. My Lexer.hs is different, this is line 126:

{-# LINE 1 "D:\\msys\\mingw64\\lib/include\\ghcversion.h" #-}

Could it be that you've rewritten the file during experiments?

As for the GLOBAL_VAR I found a workaround

Wow, what a hack :-) It looks cryptic, but it's only one place, so maybe it's not too big deal.

By the way, does this work by any chance? I just moved the offending line up.

#define GLOBAL_VAR(name,value,ty)  {-# NOINLINE name #-};             \
name :: IORef (ty);                \
name = Util.global (value);

snowleopard avatar Aug 25 '16 16:08 snowleopard

@michalt Any further updates on this?

I've just pushed quite a big refactoring/simplification, which could have messed up your uncommitted work (apologies if this is the case!). Let me know if you have any questions about the changes.

snowleopard avatar Oct 21 '16 00:10 snowleopard

@snowleopard I haven't actually looked at this recently, sorry! I'll try to find some time (maybe this weekend) to rebase and see if we can make some progress.

michalt avatar Oct 31 '16 18:10 michalt

@snowleopard So I have some experiments in:

  • https://github.com/snowleopard/hadrian/compare/master...michalt:generated-dependencies/2
  • https://github.com/ghc/ghc/compare/master...michalt:hadrian/cpp-over-hs

Notes:

  • I'm only running the needDependencies on things that start with compiler (using package == compiler didn't quite work; will look into it).
  • For now I've introduced FindCDependenciesOfHs to separate the changes I need. The only real ones are 2 additional -include parameters.
  • I've been able to remove a lot of stuff from compilerDependencies and everything seems to work (to detect what is generated and the paths, I've temporarily introduced allGeneratedDependencies).

The above also required a few changes to GHC:

  • One unsafe change of removing the NOINLINE pragma for GLOBAL_VAR (maybe we could convince GHC HQ to simply inline those definitions?)
  • A few benign changes to comments in GHC, since # cannot be the first non-whitespace character in a line.

If you have a few minutes, please let me know what you think about this and if this is a reasonable direction. Thanks!

michalt avatar Nov 06 '16 17:11 michalt

@michalt Thank you for pushing this forward. I think you are on the right track!

Your GHC changes seem fine, hopefully GLOBAL_VAR will not stand in the way.

I'm only running the needDependencies on things that start with compiler (using package == compiler didn't quite work; will look into it).

The only difference between "compiler" isPrefixOf src and package == compiler is that the former doesn't include generated sources that live in _build. Perhaps there are still some # lurking in them?

For now I've introduced FindCDependenciesOfHs to separate the changes I need. The only real ones are 2 additional -include parameters.

:+1:

I've been able to remove a lot of stuff from compilerDependencies and everything seems to work (to detect what is generated and the paths, I've temporarily introduced allGeneratedDependencies).

If I understand correctly, allGeneratedDependencies are only needed to implement fullPathIfGenerated? If yes, this is great, because then allGeneratedDependencies doesn't have to be an Expr, and a lot of further simplifications are possible. Feel free to leave this as is, I have some ideas how to deal with this.

snowleopard avatar Nov 06 '16 18:11 snowleopard

Your GHC changes seem fine, hopefully GLOBAL_VAR will not stand in the way.

Cool, should we ping some GHC dev about this? (to see what they think about it ;-)

If I understand correctly, allGeneratedDependencies are only needed to implement fullPathIfGenerated? If yes, this is great, because then allGeneratedDependencies doesn't have to be an Expr, and a lot of further simplifications are possible. Feel free to leave this as is, I have some ideas how to deal with this.

Yes, that's exactly right - I wanted to experiment with removing things from generatedDependencies but I still needed fullPathIfGenerated to work for everything. Wrt. allGeneratedDependencies not being an Expr - sounds like a good idea! I'll definitely try to clean this up. :-) (maybe in a separate PR to keep things smaller, depending on how the patch looks in the end)

michalt avatar Nov 06 '16 19:11 michalt

Cool, should we ping some GHC dev about this? (to see what they think about it ;-)

Let me try :)

@bgamari Have you got a minute to look at this diff? It's very short.

https://github.com/ghc/ghc/compare/master...michalt:hadrian/cpp-over-hs

Briefly, @michalt is trying to move around some #s in Haskell sources because they confuse CPP and don't allow us to automatically detect all #includes and get rid of some hard-coded dependencies.

Does this look sensible?

snowleopard avatar Nov 06 '16 19:11 snowleopard