cabal-helper Re-compilation when querying `UnitInfo`

The haddock documentation of UnitInfo sais:

The information extracted from a 'Unit''s on-disk configuration cache.

But when I do a runQuery (allUnits id), the whole project is recompiled. Either the documentation is in-accurate or the runQuery does not what it's supposed to do (I hope for the latter).

May 24 '20 06:05 coot

I found that reconfigureUnit runs buildProjectTarget with OnlyCfg (which turns on --only-configure flag). Isn't it possible to check if a project is already configured?

May 24 '20 06:05 coot

But it does not seem that cabal-helper only configures:

Configuring library for ghc-tags-core-0.2.0.0..
Preprocessing library for ghc-tags-core-0.2.0.0..
Building library for ghc-tags-core-0.2.0.0..
[ 1 of 13] Compiling GhcTags.CTag.Header ( lib/GhcTags/CTag/Header.hs, /home/coot/repos/ghc-tags-plugin/d
...

May 24 '20 07:05 coot

But when I do a runQuery (allUnits id), the whole project is recompiled. Either the documentation is in-accurate or the runQuery does not what it's supposed to do (I hope for the latter).

The documentation is correct though arguably it omits some of the details; c-h will re-generate the mentioned cache, Cabal's setup-config file, if it deems it necessary. This is done using a simple mtime check, see getUnitModTimes. Though the part about on-disk you mention is on the UnitInfo type, not on runQuery so there simply is no documented guarantee how a Query will get executed ATM.

I found that reconfigureUnit runs buildProjectTarget with OnlyCfg (which turns on --only-configure flag). Isn't it possible to check if a project is already configured?

Yes and we do that as I explained above. However the build-tool, in your case cabal, should also have another layer of caching that actually runs re-configuration only if necessary.

What you might be observing is dependencies being built: when, say, an executable depends on a library in the same package we have to build this library before we can configure the executable.

That's just an unfortunate reality of how Cabal functions.

May 24 '20 08:05 DanielG

Whenever cabal-hepler starts it seems always re-configures even if after cabal build all. Is this necessary? Are there any other cache strategies?

Maybe I am using the wrong tool; What I need is just a list of project files and DynFlags needed to parse them (I am writing a ctags like tool for Haskell).

May 24 '20 12:05 coot

Can you run your test again with CABAL_HELPER_DEBUG=1 and send me a gist?

Maybe I am using the wrong tool; What I need is just a list of project files and DynFlags needed to parse them (I am writing a ctags like tool for Haskell).

Certainly this is what cabal-helper is intended to do :)

However I think for generating TAGS you wouldn't actually need to know the project structure like IDEs might like (which is the use-case cabal-helper is intended for) in which case it might be a good idea to look into implementing that as a GHC plugin instead. Then all this environment stuff is just handled by the vanilla build-tools and your plugin just runs as a side-effect of a regular build.

May 28 '20 10:05 DanielG

Here's the log. Cabal cache is also invalidated after using cabal-helper.

it might be a good idea to look into implementing that as a GHC plugin instead.

That's what I did first, since a plugin is easier to implement ghc-tags-plugin, but I think it's useful to have a standalone tool for tags: easier to integrate into a project, especially a stack based. I found setting ghc plugins to be difficult at times.

Jun 06 '20 15:06 coot

Yeah, so if I'm reading that right the library in that project is being compiled because the examples depend on it and to configure the examples we need to compile the library. I'm not sure whats up with the cabal_macros.h failures after that. We might need some more cabal verbosity to figure that out. cabal -v3 should give us more info.

That's what I did first, since a plugin is easier to implement ghc-tags-plugin, but I think it's useful to have a standalone tool for tags: easier to integrate into a project, especially a stack based. I found setting ghc plugins to be difficult at times.

Why not just have this stand-alone tool manage all the plugin stuff for the user? Essentially just have it be a simple wrapper around cabal/stack that adds the appropriate options to let the plugin run?

Another approach I've been toying with is to go with the old school approach of having what ammounts to a custom GHC driver, like ghc-mod used to and hie does now. Just a program that links against the GHC library to do your custom thing -- but with a twist. Instead of using cabal-helper to run cabal/stack and collect info you ask cabal-helper to run your custom GHC driver as part of the cabal/stack build process.

This removes the difficulties associated with distributing and managing ghc plugin installation, you get full access to GHC's internals but you don't have to deal with the pain associated with version matching the global GHC install and your tool because in this scenario cabal-helper fully controls what GHC will be used (i.e. yours).

The only thing you need to do to make that approach work is implement GHC's ususal commandline conventions so cabal can talk to your tool.

Jun 07 '20 16:06 DanielG

Yeah, so if I'm reading that right the library in that project is being compiled because the examples depend on it and to configure the examples we need to compile the library.

That's right. That's a serious limitation for any larger projects which contains multiple libraries in one repo.

Why not just have this stand-alone tool manage all the plugin stuff for the user? Essentially just have it be a simple wrapper around cabal/stack that adds the appropriate options to let the plugin run?

That requires modifying cabal files at least for stack, for cabal one can add config to cabal.project.local files.

I will need to look into hie.

Jun 07 '20 19:06 coot

Yeah, so if I'm reading that right the library in that project is being compiled because the examples depend on it and to configure the examples we need to compile the library.

That's right. That's a serious limitation for any larger projects which contains multiple libraries in one repo.

I agree but it's just what we have to do currently. I have some ideas on how to lift that limitation if you're interested in helping we can have a more in-depth chat about that.

Why not just have this stand-alone tool manage all the plugin stuff for the user? Essentially just have it be a simple wrapper around cabal/stack that adds the appropriate options to let the plugin run?

That requires modifying cabal files at least for stack, for cabal one can add config to cabal.project.local files.

What about just adding --ghc-options on the command line? That should work for stack and cabal without any config file changes. Incidentally you can do that with cabal-helper if you like using the Program.caba/stackUnitArgs

I will need to look into hie.

HIE uses cabal-helper and hie-bios which has the same limitation. hie-bios essentially runs cabal repl which will go and compile the units themselves not just the dependencies, so even more stuff!

Jun 07 '20 20:06 DanielG

I have some ideas on how to lift that limitation if you're interested in helping we can have a more in-depth chat about that.

Yes, I'd be interested.

What about just adding --ghc-options on the command line?

That's what I basically do in cabal.project.local file, but for some reason I need to add -package-db so the plugin can be found. This most of the time works fine, the difficulty is to get it installed & updated. I screwed my package db stack a few times with ghc-pkg and I encountered problems with cabal install --lib too.

Jun 08 '20 06:06 coot

On Sun, Jun 07, 2020 at 11:02:05PM -0700, Marcin Szamotulski wrote:

What about just adding --ghc-options on the command line?

That's what I basically do in cabal.project.local file,

Sure, I just don't think it's a good idea to mess around with the user's persistent config. What if a user is already using cabal.project.local for something? It's hardly good manners to mess around with it.

If you absolutely need to do that you could use the --project-file commandline flag and create an out of the way tempfile instead. Though to do things properly you'd probably have to take account the users existing config too and copy that over... Seems way easier to just use a cmdline flag haha.

but for some reason I need to add -package-db so the plugin can be found. This most of the time works fine, the difficulty is to get it installed & updated. I screwed my package db stack a few times with ghc-pkg and I encountered problems with cabal install --lib too.

Ah yes, I feel that pain. Distributing this stuff isn't easy with cabal, especially with v2-build :/. Adding better support for installing plugins to cabal/stack would probably be a good idea in the long run but someone has to do it.

TBH distibuting a lib:ghc based tool also has it's challenges (version matching mainly).

I have some ideas on how to lift that limitation if you're interested in helping we can have a more in-depth chat about that.

Yes, I'd be interested.

Actually before we jump into that can you clarify this for me:

That's right. That's a serious limitation for any larger projects which contains multiple libraries in one repo.

What problem do you see exactly with building dependencies?

At the end of the day GHC needs both .hi files and generated code (i.e. .o files) eventually which involves building. Just think about TemplateHaskell. So there is no way, in general to forgoe building dependencies even if all you want to do is parse and rename a source file. Now if your tool is just content with parsing the situation is different (is it?) but as soon as you want to use the renamer to resolve identifiers you need at least .hi files so you're going to have to do some amount of building building.

Even so you might be able to forgoe linking of dependencies, but again only if a unit doesn't happen to use TH and need to actually run any of the code from dependencies.

Jun 08 '20 07:06 DanielG

Sure, I just don't think it's a good idea to mess around with the user's persistent config.

I should be more precise: I do that as a user of my plugin.

What problem do you see exactly with building dependencies?

Generating tags should not relay on things to type check. One might want to update tags file while one of the dependencies is not compiling. Running whole compiler cycle when I just need enough to parse files (so only DynFlags) is not right.

With a future version of ghc plugins it might be possible to generate information about expressions generated by TemplateHaskell, but for standalone tool I think this is acceptable to skip it. I'd be surprised if I needed to compile dependencies just to parse, but maybe I am wrong.

I don't need renamer. After parsing phase I get original identifiers, renaming happens later.

Jun 08 '20 07:06 coot

On Mon, Jun 08, 2020 at 12:22:10AM -0700, Marcin Szamotulski wrote:

I don't need renamer. After parsing phase I get original identifiers, renaming happens later.

Ah, ok. So you are only parsing. I thought you might be using the renamer for something, but thinking about it now that is obviously uneeded just to generate tags.

With a future version of ghc plugins it might be possible to generate information about expressions generated by TemplateHaskell, but for standalone tool I think this is acceptable to skip it.

Oh indeed I hadn't even thought of that. Technically you'd have to run TH splices to get all the identifiers a module contains and that's obviously going to involve building stuff.

I'd be surprised if I needed to compile dependencies just to parse, but maybe I am wrong.

Well even just to parse you need to run CPP and other preprocessors. I think with build-tool-depends together with {-# OPTIONS_GHC -pgmF some-preproc #-} you would need to build some-preproc before you can parse a source file :)

This isn't as uncommon as it sounds, hspec-discover comes to mind.

Honestly Hackage dependencies are likely to be less of a problem if transient build errors in the current project are all you're trying to avoid. The user's going to have to build such dependencies eventually one way or another so one-time build time shouldn't be a problem either -- right?

Either way the approach I have in mind should work either way, you'd just have to ignore files needing preprocessing if you don't want to build even Hackage dependencies.

Basically you build a program that implements GHC's command line interface well enough for Cabal to be able to talk as if it were a real GHC but it would only pretend to actually build things. Instead you perform the function your tool wants to do, in your case parsing the modules Cabal passes to you and generating tags for them.

There are basically two challenges to this approach:

If you're going to mess with build-outputs, in your case by not generating them, you need to get cabal to give your tool it's own namespace in the build directory.
You need to fake the package-db config files and/or ghc-pkg interaction Cabal expects to do with GHC as part of registering dependencies into the package-db.

You can do 1) by just passing a different --builddir arg to Cabal though I would prefer if it were possible to have Cabal autodetect this via a property in ghc --info. That'd have to be implemented though.

For 2) we'd have to do some investigating. I think if you replace ghc-pkg with a fake program too you shouldn't actually have to do anything there. Cabal will try to call your-ghc-pkg register foo and later expect your-ghc -package foo to work. Since we're faking both it should be fine.

Might need some careful commandline parsing or DynFlags munging as setSession will try to read package-db.conf files by default.

Jun 08 '20 07:06 DanielG

cabal-helper cabal-helper copied to clipboard

Re-compilation when querying `UnitInfo`

cabal-helper
cabal-helper copied to clipboard