Deduplicate hsc2hs args
Synopsis
TLDR it’s all too easy to make cabal pass too many command-line arguments to hsc2hs so that it will stop working at all.
Descirption
When extra-lib-dirs and/or extra-include-dirs are specified (on either command-line or via cabal.project or cabal.config) and project has hsc2hs files that need processing, the cabal will instruct hsc2hs to use specified directories by passing them via --cflag and --ldflag arguments.
The problem is that when, say, N different directories are specified, and current cabal file depends on K packages then each directory will be passed via --ldflag or --cflag in total K times because cabal will pass it once for each package in the graph. Namely, extra-*-dirs are added to all packages in graph (either by default which is probably wrong or with some effort on the user’s side despite cabal’s best judgement).
Therefore hsc2hs receives N * K command-line arguments with lots of duplication and thes passes them to gcc. If the hsc2hs itself if recent enough then it’ll receive arguments via response file. But it will pass arguments to gcc on command line so the limit on the number of arguments will be hit there.
In either case it seems reasonable to stop passing duplicate arguments regardless of where they came from which is what this PR adresses.
I have added a somewhat big test that needs to be this way to test N * K multiplication of command-line options and total number of options has to be big enough to not fit into command-line argument length limit which is nontrivial nowadays.
The fix
The fix is to first reorder hsc2hs arguments into two groups, one that goes into --cflag and another one that goes into --ldflag and deduplicate each separately. Relative order between --cflag and --ldflag was changed but it shouldn’t matter since they’re from different "namespaces". Order within each group was preserved. One commit just reorders groups and another adds deduplication.
A smaller patch is possible, e.g. https://github.com/haskell-infra/hackage-doc-builder-config/blob/master/cabal-hsc2hs-args-patch.diff, but it will go around deduplicating among arguments that start with different prefixes every time which is going to be extra work.
Real-world significance
NB this is not a theoretical toy, it has been preventing generation of documentation on Hackage no less. It’s going to take some investigation if you want to check that out, but in brief:
- Documentation for
ghcuppackage, which depends on lots of cabal packages, fails to generate https://hackage.haskell.org/package/ghcup-0.1.50.2/reports/4 with an error like this:Error: cannot execute /nix/store/2agih0y5ns3sgbviw2xhivdgg59b41g9-gcc-14-20241116/libexec/gcc/x86_64-unknown-linux-gnu/14.2.1/cc1: posix_spawn: Argument list too long - Documentation on hackage is generated by the
hackage-buildtool that uses nix and leaves at https://github.com/haskell-infra/hackage-doc-builder-config/tree/58dfa4643d74c2de595407d40da7a6f2869d511b at the moment - Problem is caused by the
hackage-buildmaking up acabal.configfile with foreign dependencies https://github.com/haskell-infra/hackage-doc-builder-config/blob/58dfa4643d74c2de595407d40da7a6f2869d511b/run-hackage-build.nix#L13 that specifiesextra-lib-dirsandextra-include-dirsfor a modest list of dependencies defined at https://github.com/haskell-infra/hackage-doc-builder-config/blob/58dfa4643d74c2de595407d40da7a6f2869d511b/build-depends.nix - As described above when documentation generation reaches
ghcupit callshsc2hsto do the preprocessing and that step fails
Template Α: This PR modifies behaviour or interface
Include the following checklist in your PR:
- [X] Patches conform to the coding conventions.
- [X] Any changes that could be relevant to users have been recorded in the changelog.
- [ ] Is the change significant? If so, remember to add
significance: significantin the changelog file.
- [ ] Is the change significant? If so, remember to add
- [ ] The documentation has been updated, if necessary.
- [ ] Manual QA notes have been included.
- [X] Tests have been added. (Ask for help if you don’t know how to write them! Ask for an exemption if tests are too complex for too little coverage!)
You might do better with a test that tells hsc2hs to call a replacement for gcc that checks for duplicated arguments. Not only is it difficult to hit limits, but on some platforms it's not possible (FreeBSD, for instance, not that GitHub currently supports CI there) and there's always the possibility that Linux eventually adopts its behavior.
It seems that this is more of an issue in hsc2hs?
Cabal already passes arguments to hsc2hs using response files (#3122), but the issue is the command line length when gcc is called? Can we fix hsc2hs in the longer term to use response files when calling gcc?
Not saying this isn't a good workaround for now but good to work out what the longer term correct thing to do is.
@geekosaur I did the change of introducing custom gcc that checks for duplicate arguments. The only thing is that it’s a bit too much work to do in Windows so there the successful compilation itself will do the check.
@mpickering It does seem like a good idea to fix hsc2hs to use response files. Somehow I found out that it does use them sometimes, but not when the command-line is too long. Still, this change seems desirable because with enough dependencies there can be appalling amount of duplication which is just redundant.