cabal icon indicating copy to clipboard operation
cabal copied to clipboard

Private dependencies

Open ezyang opened this issue 7 years ago • 19 comments

CC @Ericson2314

I realized we didn't have a tracking ticket for private dependencies, so here it is.

Prior art by @kosmikus : https://wiki.haskell.org/HaskellImplementorsWorkshop/2011/Loeh (when the modular solver was originally introduced; slides have discussion of some tricky cases.)

Newest WIP proposal from @Ericson2314 https://github.com/Ericson2314/ghc-proposals/blob/private-deps/proposals/0000-private-dependencies.rst

I have some comments for @Ericson2314:

  • Notation definitions would be helpful, esp. IPub* (I assume * means transitive closure), r IPub (cast relation into set)
  • "As today, a valid Cabal build plan must never include multiple versions of the same package in the set of public dependencies." Strictly speaking this is true but we do, today, have qualified goals which, e.g., let setup-depends and build-tools be solved separately from the public dependencies.
  • Does your proposal handle the "encapsulation" problem described in the "Encapsulation example" slide in @kosmikus's slides? The situation is that if A depends on B and C, and C privately depends on B, if C adds a public dependency on D which depends on B, the encapsulation must be lost.
  • If freshening is done purely by allocating a new unit id to encapsulated dependencies, that would certainly solve the problem. You might be able to set up the requirements for private dependencies as, "if a package has a private dependency on some package, that package gets fresh identifiers for everything which don't match any other occurrences anywhere else." So maybe @kosmikus was wrong and we shouldn't stop encapsulating even if there is a public dependency that indirectly depends on the private package: they're just different types.

ezyang avatar Oct 26 '16 01:10 ezyang

1, 2: I'll clarify thanks.

3, 4: is the encapsulation problem a problem? It is true that via type classes downstream can rely on types being _un_equal, but an easy solution is just ban reexports after all.

I wavered on this front, but I think not doing what you say for 4 is better. Consider if B has a public dep on Q, and the new dep D depends on Q instead of B. Then you have the same problem but one level deeper---Q from B->C->Q was unconstrained until C->D->Q makes Q a public dep. If you recursively always freshen, multiple "copies" of a package being in scope being fine, you end up endless bases. And that's no good.

Maybe packages get two types of private deps ("this dep should never unify with anything" and "unify with my personal public deps")?

Ericson2314 avatar Oct 27 '16 15:10 Ericson2314

Maybe "private dependencies" is the wrong way to go about thinking about this. What we want is a form of sealing, where we declare a dependency p to be sealed in some package, and this says, "If you depend on this package, it will not contribute any global constraints on p."

So let's take @kosmikus's encapsulation example. In this case, it is not that C has a private dependency on B, but that C wants to seal B. If C depends on D, which in turn depends on B, we cannot seal B without also sealing D (since any install plan that mentions D must also mention B). This avoids us from having to freshen infinitely, because we only freshen the reverse dependency closure of sealed packages up to our current one.

The downside of this approach is that if you seal a dependency, you might end up sealing some other dependencies that you wanted to be left unsealed, because those in turn depend on the sealed dependency.

Is this an improvement on your "we want the public dependencies of immediate private dependencies to agree"? I still admit to not fully understanding your formula.

ezyang avatar Oct 28 '16 09:10 ezyang

The downside of this approach is that if you seal a dependency, you might end up sealing some other dependencies that you wanted to be left unsealed, because those in turn depend on the sealed dependency.

I don't think this is sound? D->B, C->D, and A->C are all public edges, and thus types can be leaked. C can compatibly use D's types (compatible with when it didn't use D at all) in its public interface provided its used in only in new functionality.

I do like the sealing language--I'm trying to reconcile it with the above.

Ericson2314 avatar Oct 28 '16 17:10 Ericson2314

It's not unsound; I'm simply saying that if you wanted to use D "privately", you MUST also use C privately (if C depends on D.) I think this makes sense actually, because in @kosmikus's proposal, if C did depend on D that resulted in D becoming public again. That seems very unexpected; the private dep is effectively ignored.

ezyang avatar Oct 28 '16 20:10 ezyang

Hmm I am reading this again 3 years later and I'm not sure what I was saying before, but I definitely agree with @kosmikus as I think does my old WIP proposal as written.

  1. Private deps forcing public ones to be private vs public deps forcing private ones to public seems completely dual? Neither should more confusing than the other?

  2. Forcing sealing like this seems to interact oddly with comparability. Packages are free to take on new dependencies in non-breaking changes, but this can obligate a downstream package to seal more dependencies. Let's say C added a dependency on D in a later version. Now any of C in B's public interface must be made abstract. B[C] is not compatible with C[fresh-C]. The propagation here seems pathalogical for solving--you have to track public invalidations and public unification constraints.

  3. Simply banning exports of private items for an MVP is only compatible for public-force-private. Private-forces-public means we have to freshen things anyways.

  4. One can always augment the solver to ask whether a solution with no private deps forced public is available. This can be done before library (think GHC lib) release to make sure that the private deps accomplish their intent.

Ericson2314 avatar Jun 27 '19 04:06 Ericson2314

@grayjay I wrote down some thoughts about the design.

https://hedgedoc.well-typed.com/PgXXnkI8TSqw2kioBD_FnQ#

I would appreciate if you could have a look, the part I am stuck on is how to explain this closure property to the solver. It seems quite difficult to me. Perhaps you could easily see how to achieve it in one of the post processing steps after the solver tree is built.

mpickering avatar Jan 10 '24 16:01 mpickering

@mpickering Thanks for sharing the design. I have a few thoughts so far:

  1. The document says that private dependencies aren't transitive, but I was wondering if it would be better to make the private scope transitive and put the transitive dependencies into both scopes (the previous scope and the new private scope). Scopes are transitive in the solver (except for base shims), and I think that transitivity is important for enforcing consistent dependencies. Here is an example: If A depends privately on B with scope S, and B depends publicly on C, then A would be in the top-level scope, B would be in the S scope, and C would be in the top-level and S scopes. The transitivity of the S scope prevents C from depending on another instance of B. (The closure property may also force consistent dependencies.)
  2. The closure property mentions intermediate dependencies. Do these dependencies only refer to public dependencies? I'm assuming that nested private dependencies have no effect on the closure.
  3. I noticed that one issue with the design is that it requires developers to think more about their transitive dependencies. a. The fact that private dependencies are not transitive means that developers need to think about the structure of the projects that they depend on privately. For example, if someone wanted to depend on cabal-install privately, they would need to know that it also depends on cabal-install-solver and that the two packages are developed together and are expected to have the same version. Then they would need to add cabal-install-solver to every private scope that contains cabal-install, even if they didn't need to use its API, in order to prevent the private version of cabal-install from being pinned to the top-level version of cabal-install-solver. b. The closure property means that developers need to think about intermediate dependencies. For example, in the example with a, b, and c in the document, if c had a function that mentioned a type T from b, such as f :: Int -> T, then the type of f would depend on whether b was in the closure of S. If b did not depend on a, then f would return the top-level T. If a later version of b added a dependency on a, then f would start returning the prefixed T, and the package defining scope S would have a compile error if it called f.
  4. I need to think about the implementation more, but I can see why the closure property makes it difficult. The solver chooses packages one at a time, so it could take a long time to find a full closure. Some conflicts can't be found until the closure is known, so they would cause the solver to backtrack far up the tree.

grayjay avatar Jan 14 '24 09:01 grayjay

  • I think this example is not quite precise because C can't transitively depend on B as that would be a circular dependency which is already forbidden. I'm not sure the mechanism which enforces this though, and perhaps your point is that if B has two different scopes then you can get this circularity. Before considering this, did you intend the example to be circular? (B -> C -> B).

  • Yes just public dependencies, as private dependencies don't contribute any constraints on the external API and hence interaction with other packages.

  • This is true, but I think you already have to think about this problem today. Currently you might add a dependency on cabal-install and if another package in your existing dependency structure depends on cabal-install-solver at a different version then the solver will fail to find a plan because conflicting versions of cabal-install-solver will be chosen. In this situation then you can't do anything. If the dependency is private, then the way to fix this kind of conflict would be to add it to your scope, seems acceptable to me.

  • In the closure example, b is always a public dependency of a, so the choice is just about which version to satisfy the normal public dependency. Since b is not part of the private scope, the author of the package can't refer to anything in b. If a exposes functions which mention things from b then the type must be used abstractly, and if types from b need to be used then like a normal dependency it would have to be added to the private scope. So I don't see from this example how a library author needs to worry about which version of this dependency is used if it is not explicitly depended on.

  • Even if the implementation proves to be too slow or infeasible in practice, I would still like to try to implement it so that can be analysed from an empirical perspective.

mpickering avatar Jan 15 '24 15:01 mpickering

I numbered my bullet points to make them easier to discuss.

1 . Yes, my example was meant to be circular. The solver allows a package to depend on another instance of itself with a different qualifier. I initially tried to find a non-circular example, but then I realized that the closure property may prevent a package from appearing in both a scope and the dependencies of that scope. I also meant to say that I think that the scopes should be logically transitive, regardless of how they are actually implemented in the solver.

3 . b. In my example, I forgot to mention that the package defining scope S declares a public dependency on b, so it can refer to type T. Whether b depends on a then determines whether the main package additionally depends on b privately.

The issue is that there is a way to say that c must depend on the scoped b (put b in scope S in private-build-depends), but there is no way to say that c must depend on the top-level b, since b can always be pulled into scope S due to the closure property.

4 . I haven't had a chance to think about the implementation more, but if performance isn't an issue, it is always possible to apply any validation to the final install plan, which is stored in Done nodes in the search tree. Here is an example of a PR that changed one type of validation (cycle detection) from being performed on Done nodes to being performed incrementally on every choice node: #4176

EDIT: I just realized that it is not enough to look at the Done nodes, because this feature also affects the package qualifiers, which are chosen in the tree building phase.

grayjay avatar Jan 18 '24 07:01 grayjay

I find it odd to introduce a way to rename modules from a package that is completely different from the one we already have for mixins.

More generally, the particular sub-goal of re-presenting a private dependency so as to not clash with a non-private dependency seems like something we could maybe do with the machinery of Backpack? (Very vague idea, sorry, perhaps people who know more can make something of it)

michaelpj avatar Feb 28 '24 17:02 michaelpj

You raise an interesting point @michaelpj. Indeed, private scopes must definitely not clash with non-private dependencies and perhaps having the unit-id "change-by-instantiation" Backpack mechanism as a means to do that is not a bad idea.

As for mixins: I think mixins are using the more general GHC feature of "module renaming" which private scopes also use to rename all the modules of packages within the same scope.

You can rename modules (simply) or for instantiations via mixins, or you can declare a private scope and all modules in that private scope will be prefixed with the private scope alias. I don't think they overlap! The internal mechanism is the same.

alt-romes avatar Mar 05 '24 12:03 alt-romes

A small update on the implementation: we have made great progress and the feature is both working and is quite well tested. It is lacking documentation and support for other commands like gen-bounds (support for these other commands is kind of blocked by #9752, as for without it we can't properly test the changes), and the patch must still be cleaned up.

As mentioned above, we wrote some notes on the design here: https://hedgedoc.well-typed.com/PgXXnkI8TSqw2kioBD_FnQ#

alt-romes avatar Mar 05 '24 12:03 alt-romes

There are a few features that I'd like to be present (or at least planned for) in a complete implementation of private dependencies:

  1. A flag to enforce global consistency (that is, the current behavior), since I can see private dependencies causing a number of problems (eg. increased size of build products, incompatibility with traditional package managers)
  2. A way to specify the package instance in --constraints (edit: here it is!)
  3. cabal freeze integration, requiring (2)

fgaz avatar Mar 05 '24 13:03 fgaz

Thanks @fgaz.

What do you mean by this?

A flag to enforce global consistency (that is, the current behavior), since I can see private dependencies causing a number of problems (eg. increased size of build products, incompatibility with traditional package managers)

The packages in a private scope will be solved qualified (i.e. we can choose versions for packages in a private scope that are different from the versions picked for the same package in the public (normal) scope). Perhaps by "global consistency" you mean having a package which doesn't use private-build-depends?

alt-romes avatar Mar 05 '24 14:03 alt-romes

I mean solving everything unqualified, treating all private-build-depends in the whole dependency tree as regular build-depends, so at most one instance of each package/component is used.

fgaz avatar Mar 05 '24 14:03 fgaz

You can rename modules (simply) or for instantiations via mixins, or you can declare a private scope and all modules in that private scope will be prefixed with the private scope alias.

I guess my point was more about the UX. As a user I see that there is syntax for module renaming in mixins, and a different one in private-build-depends. They could be more similar! For example, you could imagine a syntax like:

private-build-depends: containers (Data.Map as C0.Data.Map)

As written, this would require people to specify all the renamings that they need manually, which is somewhat painful, but more consistent.


A completely different idea: since the point of the private dependencies is that they are like a "different package", what if we renamed the package?

private-build-depends: containers as containers-old
{-# LANGUAGE PackageImports #-}
module Main where

import "containers-old" Data.Map

A nice feature of this is that it could be implemented as a completely orthogonal feature. That is, we could allow as <alias> in normal build-depends if we wanted to.

I think this is kind of elegant? private-build-depends simply does the private dependency business, and disambiguating packages is done via a generic means of renaming packages locally. Perhaps that doesn't work because you want to have multiple private scopes? Do we need that? How bad would it be to just have one private scope per-component?

michaelpj avatar Mar 05 '24 17:03 michaelpj

I mean solving everything unqualified, treating all private-build-depends in the whole dependency tree as regular build-depends, so at most one instance of each package/component is used.

I don't think this would really work very well in general, you are going to fail to build packages where the constraints in different scopes conflict.

As things stand today there are already qualified goals introduced by setup-depends and build-tool-depends which would also be collapsed under this scheme.. that doesn't seem to be an issue currently.

I agree it is important to consider how this would affect packagers and people using the ./Setup interface though.

mpickering avatar Mar 08 '24 15:03 mpickering

You can rename modules (simply) or for instantiations via mixins, or you can declare a private scope and all modules in that private scope will be prefixed with the private scope alias.

I guess my point was more about the UX. As a user I see that there is syntax for module renaming in mixins, and a different one in private-build-depends. They could be more similar! For example, you could imagine a syntax like:

private-build-depends: containers (Data.Map as C0.Data.Map)

As written, this would require people to specify all the renamings that they need manually, which is somewhat painful, but more consistent.

A completely different idea: since the point of the private dependencies is that they are like a "different package", what if we renamed the package?

private-build-depends: containers as containers-old
{-# LANGUAGE PackageImports #-}
module Main where

import "containers-old" Data.Map

A nice feature of this is that it could be implemented as a completely orthogonal feature. That is, we could allow as <alias> in normal build-depends if we wanted to.

I think this is kind of elegant? private-build-depends simply does the private dependency business, and disambiguating packages is done via a generic means of renaming packages locally. Perhaps that doesn't work because you want to have multiple private scopes? Do we need that? How bad would it be to just have one private scope per-component?

  • Scopes have to be named, so you can distinguish between different scopes (yes you need to be allowed to use multiple different private scopes, for example to test different versions of the same library against each other).
  • Scopes can contain multiple packages which are used together.
  • You need a way to distinguish between imports of different scopes

The alias idea doesn't work very well with the requirement that a scope can contain multiple packages. It's better to group these in a cabal file under a common name/prefix so that they are uniformly available to the user rather than each under an ad-hoc alias.

The idea of using PackageImports was what we tried in the first implementation.

  • PackageImports is not a very reliable mechanism (see tickets like https://gitlab.haskell.org/ghc/ghc/-/issues/24227)
  • It seemed more complicated to register and alias packages in Cabal, you would need to setup a different package database in order to do this (not insurmountable)
  • The module renaming feature was already in GHC and seemed to work nicely with the idea of using a prefix.

At this point though it wouldn't be too much effort to try different ways of distinguishing different packages so we could try other things.

mpickering avatar Mar 08 '24 15:03 mpickering

If anyone is interested, here is an example of what using private dependencies with the current patch looks like for testing multiple versions of text against each other, and benchmarking old and new versions against each other.

Testing: https://gist.github.com/mpickering/db097bcc7c0e328e99c31480a8e33302 Benchmarking: https://gist.github.com/mpickering/a41afe485b309eb694ade02178f0a9d4

mpickering avatar Mar 08 '24 15:03 mpickering