cabal icon indicating copy to clipboard operation
cabal copied to clipboard

Make the cabal-install solver multilibs-aware

Open fgaz opened this issue 5 years ago • 35 comments

The Cabal part is done, but cabal-install's solver is not aware of the multiple public libraries feature yet, leading to late errors and issues like #6038

fgaz avatar May 07 '19 10:05 fgaz

/cc @grayjay @kosmikus -- How hard do you think this would be to implement?

23Skidoo avatar May 07 '19 13:05 23Skidoo

I think this would be pretty straightforward, at least for source packages, because it could be implemented similarly to enforcing build tool dependencies. The visibility field could probably be implemented similarly to the current handling of unbuildable components. I have a few questions about how visibility should behave, though.

  • Should the solver enforce visibility now? I wasn't sure if the comments about disabling the visibility field in #5848 applied here.
  • Should the solver just check the visibility fields in Library and InstalledPackageInfo? Can the main library be private?

There may be an issue with enforcing visibility for installed packages, because there is a hack where the solver treats installed internal libraries as separate packages with munged names (https://github.com/haskell/cabal/blob/5f57d4d0ee14592cc488385151c15edef59f9bd6/cabal-install/Distribution/Solver/Modular/IndexConversion.hs#L106-L131). I'm not sure how difficult it will be to distinguish between inter- and intra-package dependencies.

grayjay avatar May 10 '19 07:05 grayjay

Should the solver enforce visibility now? I wasn't sure if the comments about disabling the visibility field in #5848 applied here.

Yes. In #5848 we're talking about disabling it at the syntax level, but internally it's still used

Should the solver just check the visibility fields in Library and InstalledPackageInfo? Can the main library be private?

Yes and no, but I'll have to check the second one.

fgaz avatar May 10 '19 08:05 fgaz

@fgaz Thanks. I made a PR (#6047), but it only checks the visibility of source packages so far. It reads the libVisibility field for all libraries, though I can easily change it to only check sub-libraries if that is the correct behavior.

grayjay avatar May 13 '19 02:05 grayjay

#6836 enforces visibility for source packages, so the only remaining task is to handle installed packages.

grayjay avatar May 26 '20 02:05 grayjay

@grayjay it looks like you did most of the work to resolve this, can you elaborate on what's still missing?

gbaz avatar Sep 02 '21 21:09 gbaz

@gbaz The only thing that is missing in the dependency solver is enforcing visibility restrictions for installed packages, so that one package cannot depend on a private installed sub-library from another package. The change could be difficult due to the hack mentioned in https://github.com/haskell/cabal/issues/6039#issuecomment-491195495.

I also noticed that #7270 may relate to the solver, though I haven't looked into it yet.

grayjay avatar Sep 04 '21 06:09 grayjay

Issue #7270 shows that there is actually more to do here than enforcing visibility. The hack described above means that the solver handles sublibraries differently depending on whether they are source or installed. From reading the code, it looks like dependencies between packages of the same type (source or installed) are handled correctly, but dependencies from source to installed packages are handled incorrectly, due to the mismatch. Here is my current understanding of the code:

Source package -> Source package The solver represents each package as a single PInfo value. The PInfo lists the available components. Information about the libraries, such as their dependencies, is merged together, and intra-package dependencies can be safely ignored.

Installed package -> Installed package The solver represents each library in the package as a separate PInfo. Intra-package and inter-package dependencies are both represented similarly, as dependencies between PInfos. This behavior seems correct, but it is misleading, because PInfo is short for package info and was initially designed to represent a whole package. There is currently no distinction between main libraries and sublibraries or intra-package dependencies and inter-package dependencies.

Source package -> Installed package When a source package depends on an installed library, the dependency is treated as a dependency on the main library, even if the installed library is actually a sublibrary. I think that this is the cause of the solver issue in #7270.

I think that the best way to fix the issue would be to treat installed packages similarly to source packages, i.e., combine all libraries from a single installed package into a single PInfo. This design is compatible with the component based solving described in #4087. It would also require very few changes in the rest of the solver, because the solver could just treat installed sublibraries similarly to source sublibraries when checking existence and visibility. However, I'm not sure whether the design would work with the current structure of the installed package index. I don't know whether we have the information required to group installed libraries by package instance.

grayjay avatar Feb 21 '22 07:02 grayjay

Let's try to help @grayjay with this. I can make tea. :) @gbaz: could you answer the question about whether we have the information (in the package index presumably?) to group installed libraries by package instance? @fgaz: you are the multilib boss, any remarks about this plan?

Mikolaj avatar Feb 21 '22 23:02 Mikolaj

Thanks @Mikolaj!

Here are some more details about my question. Cabal passes the installed package index to the dependency solver using the type InstalledPackageIndex, as used in this function that converts all packages to type PInfo:

https://github.com/haskell/cabal/blob/1f84e56396009d21e91e1cfb95d3c540c35170cc/cabal-install-solver/src/Distribution/Solver/Modular/IndexConversion.hs#L56-L59

InstalledPackageIndex:

https://github.com/haskell/cabal/blob/1f84e56396009d21e91e1cfb95d3c540c35170cc/Cabal/src/Distribution/Simple/PackageIndex.hs#L154-L156

PackageIndex:

https://github.com/haskell/cabal/blob/1f84e56396009d21e91e1cfb95d3c540c35170cc/Cabal/src/Distribution/Simple/PackageIndex.hs#L124-L149

InstalledPackageInfo:

https://github.com/haskell/cabal/blob/1f84e56396009d21e91e1cfb95d3c540c35170cc/Cabal-syntax/src/Distribution/Types/InstalledPackageInfo.hs#L35-L39

I'd like to group libraries of the same package name, version, and instance. The ability to group the libraries is most important for choosing the dependencies of a source package that directly or indirectly depends on more than one library from the same installed package. The solver needs to ensure that all libraries chosen from that installed package are consistent (built with the same flags, dependencies, etc.).

grayjay avatar Feb 22 '22 07:02 grayjay

We discussed the issue with grouping installed libraries by package instance at the Cabal meeting. It sounds like the information that we would need is probably not available within cabal, so we would need to add more information to the installed package database. This information would also be visible to GHC and other tools. One idea is to add an "InstanceUnitId" to every installed library. The InstanceUnitId would be equivalent to the UnitId of the package's main library, regardless of whether the package actually has a main library. Then all libraries that were installed together would have the same InstanceUnitId.

@fgaz I wanted to check with you before proceeding, since I want to make sure that this idea is consistent with the multilibs design and that there isn't already a separate mechanism for determining which libraries were installed together.

grayjay avatar Apr 07 '22 06:04 grayjay

@bgamari: could you confirm the "InstanceUnitId" feature is in GHC? is 9.4? @fgaz seems to have a good understanding of the exact GHC feature we need (and, let's hope, that is already in HEAD).

[Edit: @gbaz, @fgaz, feel free to correct my question or prove it's not needed.]

Mikolaj avatar Apr 07 '22 17:04 Mikolaj

Continuing from https://github.com/haskell/cabal/issues/7270#issuecomment-1092630544:

As opposed to Cabal, cabal-install has to deal with package instances, and that's definitely true for source packages. But is that true for preexisting installed packages too? For example, leaving multiple libraries aside, is having two preinstalled instances of random-1.2.1 in the global package db supported? If it isn't, can we --like in Cabal-- just assume that all libraries with the same (package-name, version) belong to the same instance?

fgaz avatar Apr 08 '22 09:04 fgaz

@grayjay

Relevant comment in Cabal:

https://github.com/haskell/cabal/blob/a5309888fc0a33e511504c105907e8603a572143/Cabal/src/Distribution/Simple/Configure.hs#L1312

Cabal-the-library just picks an arbitrary instance of the package if there are multiple with the same name+version and --exact-configuration/--dependency is not used

fgaz avatar May 19 '22 17:05 fgaz

Hi! What's the status of this one? Is anybody actively working on it or planning to?

Mikolaj avatar Oct 11 '22 11:10 Mikolaj

@Mikolaj The next step is to determine how cabal-install handles multiple installed instances of a package (especially packages with multiple libraries) to answer @fgaz's questions in https://github.com/haskell/cabal/issues/6039#issuecomment-1092642955. If cabal-install does need to handle multiple instances, then we need to add an "InstanceUnitId" field to packages in the installed package database. Then the solver needs to be updated to group installed libraries by package name, version, and possibly instance and enforce dependencies on them similarly to source libraries. I'm not sure when I'll have more time to work on this.

grayjay avatar Oct 20 '22 09:10 grayjay

And is " how cabal-install handles multiple installed instances" some ancient knowledge of the existing codebase that has been lost, or is it something for us to decide and design afresh, or something in-between --- there were plans, but they have not been implemented or not fully enough to assure they make sense?

Mikolaj avatar Oct 29 '22 19:10 Mikolaj

@Mikolaj I was thinking that we could just determine how cabal-install currently handles multiple installed instances of a package by testing. I would like to test two things:

  • Install two instances of a non-multilibs package and see how they appear in the solver's view of the installed package index. If cabal already doesn't handle two installed instances of a non-multilibs package, then there is less reason to fully support multiple instances of multilibs packages.
  • Install two instances of a package with multiple libraries and see if it is possible to make the solver choose one library from each of them. If a mix up is possible, then we need to add the "InstanceUnitId" field in order to prevent it.

grayjay avatar Nov 17 '22 10:11 grayjay

I wonder if a volunteer could easily help with the investigation. How would one inspect the "solver's view of the installed package index"? Run with -v3 or look at dist-newstyle/cache/plan.json or is there some special debug method (e.g., decoding any of the binary dist-newstyle/cache/*-plan files)?

Mikolaj avatar Nov 17 '22 12:11 Mikolaj

The most direct way to see how the solver handles multiple installed instances is to look up the package by name in the Index returned by convPIs and print all returned instances. I think that -v3 would also work, but the test case would need to cause the solver to reject all versions of the installed package (e.g., with an unsatisfiable version constraint) in order to make the solver print them all.

I noticed that cabal has a --shadow-installed-packages flag, so the behavior may actually be configurable.

If the index returned by convPIs contains both instances of the package in the test, then I think it would also be useful to see whether it is possible to depend on either one of them.

grayjay avatar Dec 01 '22 09:12 grayjay

I have done some preparatory work and I am planning to work on this issue this week. @grayjay @Mikolaj Let me know if there's been any progress not reported here.

andreabedini avatar May 15 '23 10:05 andreabedini

Regarding package shadowing:

  • It's off by default https://github.com/haskell/cabal/blob/5f57d4d0ee14592cc488385151c15edef59f9bd6/cabal-install/Distribution/Client/Dependency.hs#L260
  • Enabling it has been left as a TODO (allow me a :facepalm:) https://github.com/haskell/cabal/blob/5f57d4d0ee14592cc488385151c15edef59f9bd6/cabal-install/Distribution/Client/ProjectPlanning.hs#L961-L963
  • Indeed the setting has been commented out everywhere
✦ ❯ rg solverSettingShadowPkgs
cabal-install/src/Distribution/Client/ProjectConfig/Types.hs
422:     --solverSettingShadowPkgs        :: Bool,

cabal-install/src/Distribution/Client/ProjectConfig.hs
258:  --solverSettingShadowPkgs        = fromFlag projectConfigShadowPkgs

cabal-install/src/Distribution/Client/ProjectPlanning.hs
1086:   -- . setShadowPkgs solverSettingShadowPkgs

andreabedini avatar May 15 '23 10:05 andreabedini

I don't know of any unreported progress. There is certainly reported interest in related things on #hackage, in https://discourse.haskell.org/t/new-hackage-server-features/2621/42, in https://github.com/haskell/hackage-server/issues/1119.

Mikolaj avatar May 15 '23 13:05 Mikolaj

@andreabedini Thanks for volunteering to work on this issue! I don't know of any additional progress, but I can give my current thoughts about test cases and the desired behavior.

I mentioned two test cases in https://github.com/haskell/cabal/issues/6039#issuecomment-1318399645. For the first case, I'm assuming that regardless of the value of the flag --shadow-installed-packages, the solver won't choose an install plan that uses two instances of a non-multilibs package as dependencies of the same component. (I don't think we need to test this, due to the single instance restriction mentioned in #6459.) In my opinion, this means that the multilibs feature should enforce similar consistency and should prevent two instances of a multilibs package from being used as dependencies of the same component.

For the second test case in https://github.com/haskell/cabal/issues/6039#issuecomment-1318399645, I wanted to give a more concrete example:

  1. Install package A-1.0 in the global package database. Package A-1.0 only contains one component, sublibrary B.
  2. Modify the source code of package A-1.0 so that it instead has sublibrary C as its only component. Install it in the global package database, too.
  3. Try to build package D-1.0 using "cabal build". Package D-1.0 has build-depends dependencies on both sublibrary B and sublibrary C. The build command should fail, since there is no instance of A-1.0 that contains both components.

Alternatively, the test case could avoid modifying source code by giving package A-1.0 a build flag that makes one sublibrary unbuildable when it is true and the other unbuildable when it is false and then installing both configurations.

grayjay avatar May 16 '23 05:05 grayjay

Here some notes from today. Some of this might be well-know but it wasn't for me :)

First question:

Install two instances of a non-multilibs package and see how they appear in the solver's view of the installed package index. If cabal already doesn't handle two installed instances of a non-multilibs package, then there is less reason to fully support multiple instances of multilibs packages.

❯ cat cabal.project 
active-repositories: :none
package-dbs: clear, global, /home/andrea/.local/state/cabal/store/ghc-9.2.7/package.db
extra-packages: zlib
constraints: zlib <0
❯ cabal build -v3 --dry-run zlib | grep '^\['
...
[__0] rejecting:
zlib-0.6.3.0/installed-6e56dc43f3ec2be12bd5546861b15931b096e743f56e443fc2faae3bceb1327a,
zlib-0.6.3.0/installed-727383b633c94a5c0b336dffa9dc4d80869f0f864348ed507f9050a29cacf41b,
zlib-0.6.3.0/installed-e3964c6e976321c89432925b0aa9fc494f0907ce8b7202161f602167c20533c2
(constraint from project config /home/andrea/tmp/two-cases/prj/cabal.project
requires <0)
...
❯ ghc-pkg --package-db /home/andrea/.local/state/cabal/store/ghc-9.2.7/package.db field zlib-0.6.3.0 id 
id: zlib-0.6.3.0-6e56dc43f3ec2be12bd5546861b15931b096e743f56e443fc2faae3bceb1327a
id: zlib-0.6.3.0-e3964c6e976321c89432925b0aa9fc494f0907ce8b7202161f602167c20533c2
id: zlib-0.6.3.0-727383b633c94a5c0b336dffa9dc4d80869f0f864348ed507f9050a29cacf41b

Multiple instances do not seem to be a problem.

Interlude:

The solver represents each library in the package as a separate PInfo. Intra-package and inter-package dependencies are both represented similarly, as dependencies between PInfos. This behavior seems correct, but it is misleading, because PInfo is short for package info and was initially designed to represent a whole package. There is currently no distinction between main libraries and sublibraries or intra-package dependencies and inter-package dependencies.

Indeed sublibraries are installed in the packagedb with one entry per component, with a mangled name. E.g.

❯ ghc-pkg --package-db /home/andrea/.local/state/cabal/store/ghc-9.2.7/package.db list | grep plutus-core
    plutus-core-1.1.1.0
    (z-plutus-core-z-index-envs-1.1.1.0)

where index-envs is a private sub-library of plutus-core package and the main library depends on the sublibrary (via its unit-id). The two package-db entries look like this:

❯ ghc-pkg --package-db /home/andrea/.local/state/cabal/store/ghc-9.2.7/package.db field plutus-core-1.1.1.0 name,version,id
name: plutus-core
version: 1.1.1.0
id: plutus-core-1.1.1.0-d85fe72bf2ff0728e98538cf83f2240595161f111577c50ccb9d9e9c4f3eff21
❯ ghc-pkg --package-db /home/andrea/.local/state/cabal/store/ghc-9.2.7/package.db field z-plutus-core-z-index-envs-1.1.1.0 name,version,package-name,lib-name,id
name: z-plutus-core-z-index-envs
version: 1.1.1.0
package-name: plutus-core
lib-name: index-envs
id: plutus-core-1.1.1.0-l-index-envs-6b7f40b957785c8930efb43ea13c63e65f84ebf44c61618f8ce55506cdd8acbb

(Note that the sub-library has package-name and lib-name while the main library does not.)

From the solver point of view, all entries in ghc-pkg are separate instances. The index (result of convPIs) indeed shows two entries:

plutus-core-1.1.1.0 Inst (UnitId "plutus-core-1.1.1.0-d85fe72bf2ff0728e98538cf83f2240595161f111577c50ccb9d9e9c4f3eff21")
z-plutus-core-z-index-envs-1.1.1.0 Inst (UnitId "plutus-core-1.1.1.0-l-index-envs-6b7f40b957785c8930efb43ea13c63e65f84ebf44c61618f8ce55506cdd8acbb")

Second question:

For the second test case in https://github.com/haskell/cabal/issues/6039#issuecomment-1318399645, I wanted to give a more concrete example:

Despite the clear instructions this test turned out a bit trickier for me. I made a package with two sub-libraries and installed them in the user packagedb (I had to use cabal act-as-setup -- install --user but that's ok). There's a bit of a problem because Setup.hs install calls ghc-pkg update which overrites entries with the same name, so I could not easily install more instances.

That said. Since each instance contains only one component, it looks meaningless to say "instance of A-1.0 that contains both components".

Plan for tomorrow:

Looking at the solver index it looks like it's not demangling the component names. IIRC Distribution.Client.IndexUtils.getInstalledPackages does the demangling correctly. I need to double check.

The original problem I am running into is that the solver does not seem to be able to re-use public sublibraries in the packagedb, and ends up recompiling anything that depends on them. This is the behaviour I described in https://github.com/input-output-hk/haskell.nix/issues/1662#issuecomment-1529558668. Tomorrow I'll investigate how the solver sees the situation. IIRC the solver was complaining it could not use the installed package because some component was missing. Other thing to double check :)

andreabedini avatar May 16 '23 12:05 andreabedini

Thanks for investigating. I didn't realize that sublibrary names were still mangled in the package db.

Despite the clear instructions this test turned out a bit trickier for me. I made a package with two sub-libraries and installed them in the user packagedb (I had to use cabal act-as-setup -- install --user but that's ok). There's a bit of a problem because Setup.hs install calls ghc-pkg update which overrites entries with the same name, so I could not easily install more instances.

That said. Since each instance contains only one component, it looks meaningless to say "instance of A-1.0 that contains both components".

I'm not sure I understand this part, so I wanted to clarify that the definition of "instance" that I was using is a single installation of a given package and version. Instances can differ by having different build flags, versions of dependencies, etc. If the package has more than one library, then the instance will be split across multiple entries in the package db. A major part of the remaining work for this issue is allowing the solver to group installed libraries into instances.

The reason that package D depends on both sublibrary B and sublibrary C in the test case is that I wanted to ensure that cabal doesn't incorrectly group the two installed sublibraries from Package A-1.0 together as one instance when they are actually from two different instances.

The original problem I am running into is that the solver does not seem to be able to re-use public sublibraries in the packagedb, and ends up recompiling anything that depends on them. This is the behaviour I described in https://github.com/input-output-hk/haskell.nix/issues/1662#issuecomment-1529558668. Tomorrow I'll investigate how the solver sees the situation. IIRC the solver was complaining it could not use the installed package because some component was missing. Other thing to double check :)

If I understand correctly, this is the problem I described in https://github.com/haskell/cabal/issues/6039#issuecomment-1046565510. The solver doesn't currently know how to satisfy dependencies from source packages to installed sublibraries. I think that the best way to solve this issue is to represent each installed package as a single PInfo, similarly to source packages. Then the solver could use the PInfo's map of components to track the availability of specific sublibraries.

grayjay avatar May 17 '23 06:05 grayjay

I'm not sure I understand this part, so I wanted to clarify that the definition of "instance" that I was using is a single installation of a given package and version. Instances can differ by having different build flags, versions of dependencies, etc. If the package has more than one library, then the instance will be split across multiple entries in the package db. A major part of the remaining work for this issue is allowing the solver to group installed libraries into instances.

Thank you for clarifying. I was indeed a bit confused about the meaning of "instance" in this discussion but I ended up adopting the one used in the solver (which identifies instances with packagedb entries). Now I understand you are using the term in a more general way and the difference is exactly what we need to teach to the solver.

This note is also relevant. https://github.com/haskell/cabal/blob/367490076f395c802cf459502056de5745fefc7e/cabal-install-solver/src/Distribution/Solver/Modular/IndexConversion.hs#L111-L143

andreabedini avatar May 18 '23 09:05 andreabedini

The reason that package D depends on both sublibrary B and sublibrary C in the test case is that I wanted to ensure that cabal doesn't incorrectly group the two installed sublibraries from Package A-1.0 together as one instance when they are actually from two different instances.

Thinking about this, are we sure this is a problem? two installed instances of Package A-1.0 will have all their dependencies fixed. Even if they end up being split into multiple libraries in the packagedb, their dependencies will either match or they won't. This only regards dependencies, I am not sure what other degrees of freedom we have and what coherence properties we expect.

andreabedini avatar May 18 '23 09:05 andreabedini

Thank you for clarifying. I was indeed a bit confused about the meaning of "instance" in this discussion but I ended up adopting the one used in the solver (which identifies instances with packagedb entries). Now I understand you are using the term in a more general way and the difference is exactly what we need to teach to the solver.

This note is also relevant.

The solver was originally designed without sublibraries, so it expected there to be one library per installed package. Then that hack was added for internal (private) libraries. It worked because it isn't possible for a source package to depend on an installed library that is private. Now I think we should change the solver's installed package index to act as a map of packages, each with one or more libraries.

Thinking about this, are we sure this is a problem? two installed instances of Package A-1.0 will have all their dependencies fixed. Even if they end up being split into multiple libraries in the packagedb, their dependencies will either match or they won't. This only regards dependencies, I am not sure what other degrees of freedom we have and what coherence properties we expect.

I don't think that we should mix libraries from different installations of a package, since they could differ in more ways than dependencies, such as cabal build flags, compiler flags, or even source code. In my opinion, it would be similar to mixing libraries from different versions of the package.

I also think that recording the instance of an installed library (with the InstanceUnitId idea from above) would simplify forcing the dependencies to match.

grayjay avatar May 19 '23 05:05 grayjay

A large question around this seems to be: "what does compatible" mean? From a purely linking perspective, we shouldn't be mixing GHC ways. You can't have a profiling library work with a vanilla one.

From a dependency perspective, any "consumer" of a package find that package compatible (with any other instance of that package) as long as the exposed API (symbols, and relevant signatures) match; anything else is a blackbox to the consumer.

This ignore the semantic part where the symbols and signatures could stay the same, but the behaviour/meaning is different. That's what we usually have the PVP for?

angerman avatar Nov 28 '23 02:11 angerman