pak icon indicating copy to clipboard operation
pak copied to clipboard

Resolver hangs with too many repos

Open jeroen opened this issue 1 year ago • 2 comments

As discussed earlier, I run into a problem where the resolver hangs if there are too many repos.

> pak::pkg_install('ATACseqQC', dependencies=TRUE)
✔ Updated metadata database: 5.50 MB in 13 files.
✔ Updating metadata database ... done
.................... 

Would there be a way to give some hint to pak to prevent the combinatorial explosion in the solver?

One way to reproduce is this container:

docker run --rm -it --env MY_UNIVERSE=https://bioc.r-universe.dev/ --entrypoint=R ghcr.io/r-universe-org/build-wasm:latest

You can also reproduce it with another container running ubuntu:24.04 and setting:

options(repos = c(
    binaries = "https://bioc.r-universe.dev/bin/linux/noble/4.4",
    universe = "https://bioc.r-universe.dev", 
    CRAN = "https://p3m.dev/cran/__linux__/noble/latest",
    BioCsoft = "https://bioconductor.org/packages/3.20/bioc", 
    BioCann = "https://bioconductor.org/packages/3.20/data/annotation",
    BioCexp = "https://bioconductor.org/packages/3.20/data/experiment",
    fallback = "https://cloud.r-project.org", 
    archive = "https://cranhaven.r-universe.dev"
))

jeroen avatar May 08 '24 10:05 jeroen

I am probably hitting the same issue after adding a cranlike repo with many packages and trying to install a package with a medium to large dep tree.

It works when installing a package without or only a few deps.

There isn't any debug/verbose mode to narrow down what is happening inside the resolver, is it?

pat-s avatar Sep 03 '24 16:09 pat-s

You can debug the R code as usual (debug(), browser(), etc.), and the the C code as usual, with gdb or lldb.

gaborcsardi avatar Sep 03 '24 16:09 gaborcsardi

Some notes from the past days:

The problem occurs when there are multiple repos sharing many of the same packages but different versions. In this case the solver will consider each possible combination of package versions, resulting in a combinatorial explosion of solutions.

Hopefully https://github.com/r-lib/pkgdepends/pull/392 relieves most of the problem. This heuristic filters out the older version of two packages, if they share exactly the same dependencies (and therefore the older version is never part of the solution).

Another way to improve the situation is to ensure pak can recognize binary package via a Platform field in the PACKAGES file. Because pak prefers binaries of source packages, this will also reduce the number of potential solutions (this is the reason that CRAN and P3M can be used together without problem).

jeroen avatar Nov 08 '24 18:11 jeroen

OK, things seem to be working well now, so I'll close this issue, until somebody runs into problems again. For the record, some possible improvements:

  • use a better solver, e.g. highs,
  • expose the timeout argument of lpSolve::lp(),
  • implement a simpler solver that we can try first, which selects the "latest best" version of each package. The problem with this is that it possibly only helps with simple cases, for which the ILP solver is also fast, so we gain nothing in practice.
  • try to auto-detect difficult cases, where many packages have multiple versions available and warn the user about this.

gaborcsardi avatar Nov 09 '24 11:11 gaborcsardi