dune icon indicating copy to clipboard operation
dune copied to clipboard

pkg: allow arbitrary opam repo layouts

Open gridbugs opened this issue 10 months ago • 8 comments

Opam repos can have fairly arbitrary directory layouts. The only requirement is that all package manifests are under the top-level "packages" directory, and each manifest is in a file named "opam" whose parent directory is named like ".".

Prior to this change dune assumed a layout similar to the official opam-repository, where manifests are at "packages//./opam".

This generalizes dune's expectations of opam repositories such that arbitrary directory layouts are supported.

Fixes https://github.com/ocaml/dune/issues/11615

gridbugs avatar Apr 08 '25 06:04 gridbugs

Thanks for the PR! I have noticed few typos, but I mostly have questions regarding the errors handling. My main question is shouldn't we prevent the users from using a corrupted opam repository?

If a repo does not follow the convention set by opam-repository, that doesn't mean it is corrupt. The documentation on repo layout in the opam manual specifies that any layout where opam files are contained in directories named after the package is acceptable. The document also suggests a potential future change to the layout of opam-repository where another level of hierarchy is introduced to group packages by common prefixes, e.g. packages/p/pa/package-name/package-name.version.

I agree with the sentiment that in practice we're probably only ever going to see repos formatted like opam-repository or JST, and the complexity/runtime cost of fully general opam repos is too high to be worth it. I briefly thought about introducing a fast-path where if a repo is formatted like opam-repository then we can make some optimizing assumptions, but the only way to be sure that a repo is formatted like opam-repository is to do a full scan of all its opam files which is just as expensive as not having a fast-path to begin with.

So I'm thinking I'll defer solving https://github.com/ocaml/dune/issues/11615 until I have more time to work on it, and eventually we can use a solution more specialized to the layouts we see in opam-repository and JST.

Alternatively, @rgrinberg how feasible would it be to ask Jane Street to change the layout of their repos to match opam-repository?

gridbugs avatar Apr 09 '25 07:04 gridbugs

If a repo does not follow the convention set by opam-repository, that doesn't mean it is corrupt.

It is a misunderstanding: I was referring to repositories with twice the same file or an opam file at the root.

maiste avatar Apr 09 '25 07:04 maiste

I'll ask, but I don't really see an issue with implementing an alternative lazy loading scheme. What we're doing is implementing the following function for a single repository:

val list_packages : Package_name.t -> Package_version.t list

So you could implement this function by:

  1. Try to list packages/$package. If this succeeds, we finish our search.
  2. If the above fails because packages/$package doesn't exist, you read can read all of packages and look for $package.*. The result is now the matches in 3
  3. For subsequent lookups, you want the listing of packages/ to be cached. Otherwise it would be really slow for large repos that aren't from git.

With this scheme, the common users always get the optimized experience, while we still support this alternative layout. For multiple repositories, we are sadly going to hit the 2nd step more often than we'd like. On the other hand, multiple repositories aren't all that common and the performance problem aren't relevant to opam repositories from git.

rgrinberg avatar Apr 10 '25 15:04 rgrinberg

For multiple repositories, we are sadly going to hit the 2nd step more often than we'd like. On the other hand, multiple repositories aren't all that common and the performance problem aren't relevant to opam repositories from git.

What do you mean by "multiple repositories" here?

gridbugs avatar Apr 11 '25 01:04 gridbugs

I just noticed that Jane Street's compiler extensions branch uses a mix of different layouts within its packages directory. Most packages are directly under packages but a few (e.g. ocaml-variants) get a subdirectory in the style of opam-repository. @rgrinberg is that what you mean by a "multiple repository"?

gridbugs avatar Apr 11 '25 02:04 gridbugs

Multiple repositories can be defined in dune's workspace file in order. Refer to tests like multiple-opam-repos.t for examples.

Mixing the two layouts defeats the ability to implement any lazy loading scheme. I'm going to ask about sticking to one scheme. For now, I think you could just make eager loading "opt-in" for some repos.

rgrinberg avatar Apr 11 '25 16:04 rgrinberg

Is lazy-loading the opam repo something we're planning to implement in the future? Unless I'm mistaken it looks like we currently enumerate all the paths in the opam repo when creating a Rev_store.At_rev.t to populate the files: File.Set.t field (here). I timed dune pkg lock a few times with and without this change and didn't see a noticeable difference.

gridbugs avatar Apr 14 '25 01:04 gridbugs

Lazy loading should already work for local repositories (non-git). That's a fairly important use case when working on a change to opam-repository and testing it locally for example.

It would be great if we could make it work for git repositories as well, but it would be much work for that.

rgrinberg avatar Apr 23 '25 12:04 rgrinberg