pak icon indicating copy to clipboard operation
pak copied to clipboard

support for installation from renv.lock lockfile?

Open kevinushey opened this issue 2 years ago • 26 comments

It looks like pak has its own machinery for creating and installing from lockfiles, e.g.

pak::lockfile_create(<pkgs>)
pak::lockfile_install()

Is there a straightforward mechanism whereby lockfiles created by renv.lock could work here as well?

The main barrier I see is that one cannot yet create lockfiles from versioned R packages; e.g.

> pak::lockfile_create("[email protected]")
x Creating lockfile pkg.lock [66ms]
Error: Cannot install packages:
* [email protected]: Versioned CRAN packages are not implemented yet

If we had that, I think it would be straightforward for renv to create a list of versioned remotes that could be passed into pak.

kevinushey avatar Dec 08 '21 21:12 kevinushey

pak has more information in its lockfile, so you cannot replace it with an renv lockfile.

Nevertheless it is in the plans to install an renv lockfile.

gaborcsardi avatar Dec 08 '21 21:12 gaborcsardi

@kevinushey One issue with the renv lockfile is that AFAICT it does not contain the dependencies, so we don't know the right installation order. This means that to start the installation we would need to unpack all packages first, and then look up the dependencies.

This would be much simpler if the lockfile also had the dependencies, just the package names, e.g.

...
      "Package": "callr",
      "Version": "3.7.0.9000",
      "Source": "GitHub",
      "RemoteType": "github",
      "Dependencies": ["processx", "R6"],
...

gaborcsardi avatar Dec 14 '21 11:12 gaborcsardi

What values would be included in Dependencies -- should that be all of Depends + Imports + LinkingTo? (Or should I consider just including those in the lockfile entries?)

Resolving the installation order post-hoc doesn't seem that bad; if I understand correctly you'd have to do this anyway if someone provided (for example) a URL remote or another similar "exotic" remote. I'm not sure if supporting those is something pak plans to do, though.

kevinushey avatar Dec 15 '21 00:12 kevinushey

Just those in the lockfile entries. renv does not add URLs for CRAN-like packages, so you'd need LinkingTo as well, in case a source package is installed.

pak lockfiles have the dependencies for all packages, including exotic ones, for example

pak::lockfile_create("url::https://cran.rstudio.com/src/contrib/cli_3.1.0.tar.gz")

will create

...
    {
      "ref": "url::https://cran.rstudio.com/src/contrib/cli_3.1.0.tar.gz",
      "package": "cli",
      "version": "3.1.0",
      "type": "url",
      "direct": true,
      "binary": false,
      "dependencies": ["glue"],
      "vignettes": false,
      "needscompilation": true,
      "metadata": {
        "RemotePkgRef": "url::https://cran.rstudio.com/src/contrib/cli_3.1.0.tar.gz",
        "RemoteType": "url",
        "RemoteEtag": "\"75b0e-5cf56b5f6c91b\"",
        "RemotePackaged": "TRUE"
      },
      "sources": ["https://cran.rstudio.com/src/contrib/cli_3.1.0.tar.gz"],
...
    },
    {
      "ref": "glue",
      "package": "glue",
      "version": "1.5.1",
      "type": "standard",
      "direct": false,
      "binary": true,
      "dependencies": [],
      "vignettes": false,
      "needscompilation": false,
      "metadata": {
        "RemoteType": "standard",
        "RemotePkgRef": "glue",
        "RemoteRef": "glue",
        "RemoteRepos": "https://cloud.r-project.org",
        "RemotePkgPlatform": "aarch64-apple-darwin20",
        "RemoteSha": "1.5.1"
      },
      "sources": ["https://cloud.r-project.org/bin/macosx/big-sur-arm64/contrib/4.1/glue_1.5.1.tgz", "https://mac.r-project.org/bin/macosx/big-sur-arm64/contrib/4.1/glue_1.5.1.tgz"],
...

No dependency resolution is performed here at all, when installing this. We can start downloading packages right away, we don't even need the CRAN metadata. Then we can start installing them right away, using as many subprocesses as possible.

FWIW other software does the same, e.g. a Cargo.lock or a package-lock.json both have dependencies included.

gaborcsardi avatar Dec 15 '21 04:12 gaborcsardi

Would this be sufficient?

{
  "R": {
    "Version": "4.1.2",
    "Repositories": [
      {
        "Name": "CRAN",
        "URL": "https://cran.rstudio.com"
      }
    ]
  },
  "Packages": {
    "cli": {
      "Package": "cli",
      "Version": "3.1.0",
      "Source": "CRAN",
      "Repository": "CRAN",
      "RemoteType": "standard",
      "RemotePkgRef": "cli",
      "RemoteRef": "cli",
      "RemoteRepos": "https://cran.rstudio.com/",
      "RemotePkgPlatform": "source",
      "RemoteSha": "3.1.0",
      "Hash": "66a3834e54593c89d8beefb312347e58",
      "Requirements": [
        "glue"
      ]
    },
    "glue": {
      "Package": "glue",
      "Version": "1.5.1",
      "Source": "CRAN",
      "Repository": "CRAN",
      "RemoteType": "standard",
      "RemotePkgRef": "glue",
      "RemoteRef": "glue",
      "RemoteRepos": "https://cran.rstudio.com/",
      "RemotePkgPlatform": "source",
      "RemoteSha": "1.5.1",
      "Hash": "e01bc1fe0c20954ec97eac86640abc70",
      "Requirements": []
    }
  }
}

Note that 'Requirements' field which just provides the name of other packages in the lockfile this package depends on.

kevinushey avatar Dec 16 '21 00:12 kevinushey

Yes, that is exactly what we need, like in https://github.com/r-lib/pak/issues/343#issuecomment-993444459.

gaborcsardi avatar Dec 16 '21 07:12 gaborcsardi

@kevinushey So, if you add the dependencies to the lockfile, then I can implement this pretty quickly.

gaborcsardi avatar Dec 21 '21 16:12 gaborcsardi

This has been implemented now; the Requirements entry for each package record will be a JS array of package names (which are all of Depends / Imports / LinkingTo). You should see that if you test with the development version of renv.

kevinushey avatar Dec 21 '21 17:12 kevinushey

@gaborcsardi @kevinushey Just passing by and querying for the current status as it has been quite here for some time.

Using {pak} as a backend for {renv} package installs would really be great to have! 👀

pat-s avatar Nov 04 '22 17:11 pat-s

ciao @pat-s, it seems that {renv} is supporting {pak} (see the config option here). IIUC, there are still some things to sort out - for example the installation procedure: should I install.packages(c('renv', 'pak')) or use {renv} to install {pak} or viceversa?

baggiponte avatar Nov 12 '22 18:11 baggiponte

I''ve just tried this with renv 0.16.0 and setting options(renv.config.pak.enabled = TRUE) It seems you need to install pak before running, e.g. renv::restore otherwise the the standard way to install packages will be used.

After installing pak successfully and running renv::restore immediately the following error came up. I'm using a renv.lockfile that was created with renv 0.15.5 not sure if that is the reason, because the same error came up when testing the above commands with renv 0.15.5.

Error: <callr_remote_error: Can't parse remotes: >
 in process 753095 
-->
<simpleError in get_remote_types(refs): Can't parse remotes: >

 Stack trace:

 12. (function (...)  ...
 13. base:::withCallingHandlers(cli_message = function(msg) { ...
 14. get("pkg_install_make_plan", asNamespace("pak"))(...)
 15. pkgdepends::new_pkg_installation_proposal(pkg, config = list(libr ...
 16. pkg_installation_proposal$new(refs, config = config, ...)
 17. pkgdepends:::initialize(...)
 18. pkg_plan$new(refs, config = config, library = config$library,  ...
 19. pkgdepends:::initialize(...)
 20. pkgdepends:::pkgplan_init(self, private, refs, config, library,  ...
 21. pkgdepends:::parse_pkg_refs(refs)
 22. pkgdepends:::get_remote_types(refs)
 23. base:::stop("Can't parse remotes: ", paste(refs[bad], collapse =  ...
 24. base:::.handleSimpleError(function (e)  ...
 25. h(simpleError(msg, call))
 26. base:::stop(e)
 27. (function (e)  ...

 x Can't parse remotes:  

Traceback (most recent calls last):
11: install.packages("pak")
10: install(pkgs)
 9: renv_pak_install(packages, libpaths)
 8: pak$pkg_install(pkg = packages, lib = library[[1L]], upgrade = TRUE)
 7: remote(function(...) get("pkg_install_make_plan", asNamespace("pak"))(...), 
        list(pkg = pkg, lib = lib, upgrade = upgrade, ask = ask, 
            start = start, dependencies = dependencies, loaded = loaded_packages(lib)))
 6: err$rethrow(stop(res$error$parent$error), res$error$parent, call = FALSE)
 5: withCallingHandlers(expr, error = function(e) {
        if (is.null(e$`_nframe`)) 
            e$`_nframe` <- length(sys.calls())
        e$`_childcall` <- realcall
        e$`_childframe` <- realframe
        e$`_childignore` <- list(c(realframe + 1L, realframe + 1L), 
            c(e$`_nframe` + 1L, sys.nframe() + 1L))
        throw(cond, parent = e)
    })
 4: stop(res$error$parent$error)
 3: <condition-handler>(...)
 2: throw(cond, parent = e)
 1: stop(cond)

ccasar avatar Nov 14 '22 10:11 ccasar

Is there any progress on this @gaborcsardi? I'd be happy to try to put together a PR if you can give some direction on how to go from the renv lockfile with dependencies specified (above) to a functional pak lockfile.

alexvpickering avatar Aug 08 '23 22:08 alexvpickering

@kevinushey So if pak were to install packages from an renv lockfile, where would those packages go, and how would this work together with renv's libraries?

Would the renv project need to be activated first?

Would pak need to be a dependency in the renv project?

If the renv project does not need to be active, then pak would install the packages into a regular (non-renv) library? That does not seem ideal.

I don't really see how this would work.

gaborcsardi avatar Sep 13 '23 09:09 gaborcsardi

@ccasar where could we find the renv.lock that you have used?

jrosell avatar Sep 13 '23 10:09 jrosell

So if pak were to install packages from an renv lockfile, where would those packages go, and how would this work together with renv's libraries?

They would get installed into .libPaths()[1], which would normally be set to the renv project library when renv is loaded for a project.

Would the renv project need to be activated first?

Yes, to ensure the library paths are set appropriately.

Would pak need to be a dependency in the renv project?

It shouldn't; at least from the renv side, renv::install() and friends automatically install and load pak when required, so it's sort of an automatically-fulfilled implicit dependency for projects that have opted-in to using pak.


Just to re-iterate, right now, renv::install() uses pak when options(renv.config.pak.enabled = TRUE) is set. When this is set, renv basically forms a call like the following:

pak::pkg_install(c("[email protected]", ...))

That is, it transforms the lockfile record entries into short-form remotes (including their versions) that can be processed by pak.

kevinushey avatar Sep 13 '23 16:09 kevinushey

OK, but that means that there is no pak function to add, and pak does not actually need to read the renv lockfile, and there is nothing to do here, essentially?

gaborcsardi avatar Sep 13 '23 17:09 gaborcsardi

I was able to reduce our GitHub Actions build times from ~4.5 hrs to ~1hr in this repo.

I couldn't get renv::restore() with options(renv.config.pak.enabled = TRUE) to work. pak's usage of pkgdepends (the standard usage) leads to complaints of conflicting versions.

I was able to cobble together a solution that uses pkgdepends directly. It successfully installs the majority of packages in the renv.lock though some are not installed (not sure why). To get around this, our Dockerfile first uses pkgdepends (in restore_fast.R) and then uses renv::restore (in restore_renv.R) to install any remaining packages. A couple of other issues I noted:

  • renv::restore will re-install packages that are 'crossgrade' (same version installed but a current snapshot would produce a lockfile that has different fields/values). I skip these in the above repo.
  • pkgdepends will throw an error if there are packages that have been removed from CRAN (archived versions available e.g. Matrix.utils) with with error invalid version specification 'NA'. The workaround for these is to direct install the package using renv::install from the archived url and then take a snapshot. renv does not have a problem resolving these packages so it should be possible to fix this as well.

alexvpickering avatar Sep 13 '23 17:09 alexvpickering

@alexvpickering One hour still seems pretty long, have you tried to use binary Linux packages from https://packagemanager.posit.co?

Re Matrix.utils, that's not in the lockfile at https://github.com/hms-dbmi-cellenics/pipeline/blob/master/pipeline-runner/renv.lock, am I looking at the wrong file?

Btw. the Bioconductor packages are supposed to install from their git repository? That seems a bit weird, and it is probably slower than installing them from their CRAN-like repository.

gaborcsardi avatar Sep 13 '23 17:09 gaborcsardi

@kevinushey Do you have a list of possible values for the Source field in the lockfile, and the extra fields added for each package source? Just to make sure that pak can indeed install all possible package sources.

gaborcsardi avatar Sep 13 '23 17:09 gaborcsardi

OK, but that means that there is no pak function to add, and pak does not actually need to read the renv lockfile, and there is nothing to do here, essentially?

Yeah, at least from renv's perspective, now that pak supports versioned remotes, we can use pak to install packages from a lockfile.

Given this, there's probably not an explicit need for pak to support renv lockfiles, since anyone who wants to use an renv lockfile should be using renv::restore(), and so rely on renv to use pak appropriately.

tl;dr: unless you plan to further extend pak here, I think we can close this?

kevinushey avatar Sep 13 '23 17:09 kevinushey

@kevinushey Do you have a list of possible values for the Source field in the lockfile, and the extra fields added for each package source? Just to make sure that pak can indeed install all possible package sources.

The main things you'll see there are "Repository", "Bioconductor", other values already encoded in "RemoteType", "unknown", and "Cellar" (for packages that were found in the renv cellar; https://rstudio.github.io/renv/articles/package-sources.html#the-package-cellar). Although I'm not sure if the "Cellar" source is a good idea...

kevinushey avatar Sep 13 '23 17:09 kevinushey

Sorry, what I meant is, what values can be in the Source field? E.g. in the lockfile above, there are these:

❯ names(table(sapply(df$Packages, "[[", "Source")))
[1] "Bioconductor" "GitHub"       "Repository"   "URL"

but there are probably others?

gaborcsardi avatar Sep 13 '23 17:09 gaborcsardi

@ccasar where could we find the renv.lock that you have used?

Sorry for not providing it earlier @jrosell. I'm trying to reproduce the error now with renv 1.0.2 and pak 0.6.0, but it seems to be solved in the meantime.

ccasar avatar Sep 13 '23 17:09 ccasar

but there are probably others?

If the package was installed with remotes or pak, then the RemoteType written by that package would be copied over as the "Source".

It's probably easier to just look at the implementation here: https://github.com/rstudio/renv/blob/main/R/snapshot.R#L712-L760

kevinushey avatar Sep 13 '23 18:09 kevinushey

@alexvpickering One hour still seems pretty long, have you tried to use binary Linux packages from https://packagemanager.posit.co?

Not tried yet. Locally the restore takes ~30 mins (more cores than GA) for ~250 packages. Time is fine for our purposes now as we also employ caching of images so 1hr is worst case scenario and much better than 4.5 hours. Timing would be improved decently if the renv::restore wasn't necessary at all.

Re Matrix.utils, that's not in the lockfile at https://github.com/hms-dbmi-cellenics/pipeline/blob/master/pipeline-runner/renv.lock, am I looking at the wrong file?

Yes sorry my bad, Matrix.utils was a problem case for a related repo. For the above repo, an example is spatstat.core.

Btw. the Bioconductor packages are supposed to install from their git repository? That seems a bit weird, and it is probably slower than installing them from their CRAN-like repository.

Not sure I understand. All packages were just installed and snapshots taken as normal as far as I remember.

alexvpickering avatar Sep 13 '23 19:09 alexvpickering

Matrix.utils seems like a bug in pak/pkgdepends. spatstat.core does not compile on my machine, so that's probably the reason.

In any case, if pkgdepends cannot install a package that it should be able to install, please open an issue! Thanks!

Not sure I understand. All packages were just installed and snapshots taken as normal as far as I remember.

No worries, it was more of a question for Kevin, but I think I understand the reason now.

gaborcsardi avatar Sep 13 '23 19:09 gaborcsardi