pak
pak copied to clipboard
support for installation from renv.lock lockfile?
It looks like pak has its own machinery for creating and installing from lockfiles, e.g.
pak::lockfile_create(<pkgs>)
pak::lockfile_install()
Is there a straightforward mechanism whereby lockfiles created by renv.lock could work here as well?
The main barrier I see is that one cannot yet create lockfiles from versioned R packages; e.g.
> pak::lockfile_create("[email protected]")
x Creating lockfile pkg.lock [66ms]
Error: Cannot install packages:
* [email protected]: Versioned CRAN packages are not implemented yet
If we had that, I think it would be straightforward for renv to create a list of versioned remotes that could be passed into pak.
pak has more information in its lockfile, so you cannot replace it with an renv lockfile.
Nevertheless it is in the plans to install an renv lockfile.
@kevinushey One issue with the renv lockfile is that AFAICT it does not contain the dependencies, so we don't know the right installation order. This means that to start the installation we would need to unpack all packages first, and then look up the dependencies.
This would be much simpler if the lockfile also had the dependencies, just the package names, e.g.
...
"Package": "callr",
"Version": "3.7.0.9000",
"Source": "GitHub",
"RemoteType": "github",
"Dependencies": ["processx", "R6"],
...
What values would be included in Dependencies -- should that be all of Depends + Imports + LinkingTo? (Or should I consider just including those in the lockfile entries?)
Resolving the installation order post-hoc doesn't seem that bad; if I understand correctly you'd have to do this anyway if someone provided (for example) a URL remote or another similar "exotic" remote. I'm not sure if supporting those is something pak plans to do, though.
Just those in the lockfile entries. renv does not add URLs for CRAN-like packages, so you'd need LinkingTo as well, in case a source package is installed.
pak lockfiles have the dependencies for all packages, including exotic ones, for example
pak::lockfile_create("url::https://cran.rstudio.com/src/contrib/cli_3.1.0.tar.gz")
will create
...
{
"ref": "url::https://cran.rstudio.com/src/contrib/cli_3.1.0.tar.gz",
"package": "cli",
"version": "3.1.0",
"type": "url",
"direct": true,
"binary": false,
"dependencies": ["glue"],
"vignettes": false,
"needscompilation": true,
"metadata": {
"RemotePkgRef": "url::https://cran.rstudio.com/src/contrib/cli_3.1.0.tar.gz",
"RemoteType": "url",
"RemoteEtag": "\"75b0e-5cf56b5f6c91b\"",
"RemotePackaged": "TRUE"
},
"sources": ["https://cran.rstudio.com/src/contrib/cli_3.1.0.tar.gz"],
...
},
{
"ref": "glue",
"package": "glue",
"version": "1.5.1",
"type": "standard",
"direct": false,
"binary": true,
"dependencies": [],
"vignettes": false,
"needscompilation": false,
"metadata": {
"RemoteType": "standard",
"RemotePkgRef": "glue",
"RemoteRef": "glue",
"RemoteRepos": "https://cloud.r-project.org",
"RemotePkgPlatform": "aarch64-apple-darwin20",
"RemoteSha": "1.5.1"
},
"sources": ["https://cloud.r-project.org/bin/macosx/big-sur-arm64/contrib/4.1/glue_1.5.1.tgz", "https://mac.r-project.org/bin/macosx/big-sur-arm64/contrib/4.1/glue_1.5.1.tgz"],
...
No dependency resolution is performed here at all, when installing this. We can start downloading packages right away, we don't even need the CRAN metadata. Then we can start installing them right away, using as many subprocesses as possible.
FWIW other software does the same, e.g. a Cargo.lock or a package-lock.json both have dependencies included.
Would this be sufficient?
{
"R": {
"Version": "4.1.2",
"Repositories": [
{
"Name": "CRAN",
"URL": "https://cran.rstudio.com"
}
]
},
"Packages": {
"cli": {
"Package": "cli",
"Version": "3.1.0",
"Source": "CRAN",
"Repository": "CRAN",
"RemoteType": "standard",
"RemotePkgRef": "cli",
"RemoteRef": "cli",
"RemoteRepos": "https://cran.rstudio.com/",
"RemotePkgPlatform": "source",
"RemoteSha": "3.1.0",
"Hash": "66a3834e54593c89d8beefb312347e58",
"Requirements": [
"glue"
]
},
"glue": {
"Package": "glue",
"Version": "1.5.1",
"Source": "CRAN",
"Repository": "CRAN",
"RemoteType": "standard",
"RemotePkgRef": "glue",
"RemoteRef": "glue",
"RemoteRepos": "https://cran.rstudio.com/",
"RemotePkgPlatform": "source",
"RemoteSha": "1.5.1",
"Hash": "e01bc1fe0c20954ec97eac86640abc70",
"Requirements": []
}
}
}
Note that 'Requirements' field which just provides the name of other packages in the lockfile this package depends on.
Yes, that is exactly what we need, like in https://github.com/r-lib/pak/issues/343#issuecomment-993444459.
@kevinushey So, if you add the dependencies to the lockfile, then I can implement this pretty quickly.
This has been implemented now; the Requirements entry for each package record will be a JS array of package names (which are all of Depends / Imports / LinkingTo). You should see that if you test with the development version of renv.
@gaborcsardi @kevinushey Just passing by and querying for the current status as it has been quite here for some time.
Using {pak} as a backend for {renv} package installs would really be great to have! 👀
ciao @pat-s, it seems that {renv} is supporting {pak} (see the config option here). IIUC, there are still some things to sort out - for example the installation procedure: should I install.packages(c('renv', 'pak')) or use {renv} to install {pak} or viceversa?
I''ve just tried this with renv 0.16.0 and setting options(renv.config.pak.enabled = TRUE)
It seems you need to install pak before running, e.g. renv::restore otherwise the the standard way to install packages will be used.
After installing pak successfully and running renv::restore immediately the following error came up. I'm using a renv.lockfile that was created with renv 0.15.5 not sure if that is the reason, because the same error came up when testing the above commands with renv 0.15.5.
Error: <callr_remote_error: Can't parse remotes: >
in process 753095
-->
<simpleError in get_remote_types(refs): Can't parse remotes: >
Stack trace:
12. (function (...) ...
13. base:::withCallingHandlers(cli_message = function(msg) { ...
14. get("pkg_install_make_plan", asNamespace("pak"))(...)
15. pkgdepends::new_pkg_installation_proposal(pkg, config = list(libr ...
16. pkg_installation_proposal$new(refs, config = config, ...)
17. pkgdepends:::initialize(...)
18. pkg_plan$new(refs, config = config, library = config$library, ...
19. pkgdepends:::initialize(...)
20. pkgdepends:::pkgplan_init(self, private, refs, config, library, ...
21. pkgdepends:::parse_pkg_refs(refs)
22. pkgdepends:::get_remote_types(refs)
23. base:::stop("Can't parse remotes: ", paste(refs[bad], collapse = ...
24. base:::.handleSimpleError(function (e) ...
25. h(simpleError(msg, call))
26. base:::stop(e)
27. (function (e) ...
x Can't parse remotes:
Traceback (most recent calls last):
11: install.packages("pak")
10: install(pkgs)
9: renv_pak_install(packages, libpaths)
8: pak$pkg_install(pkg = packages, lib = library[[1L]], upgrade = TRUE)
7: remote(function(...) get("pkg_install_make_plan", asNamespace("pak"))(...),
list(pkg = pkg, lib = lib, upgrade = upgrade, ask = ask,
start = start, dependencies = dependencies, loaded = loaded_packages(lib)))
6: err$rethrow(stop(res$error$parent$error), res$error$parent, call = FALSE)
5: withCallingHandlers(expr, error = function(e) {
if (is.null(e$`_nframe`))
e$`_nframe` <- length(sys.calls())
e$`_childcall` <- realcall
e$`_childframe` <- realframe
e$`_childignore` <- list(c(realframe + 1L, realframe + 1L),
c(e$`_nframe` + 1L, sys.nframe() + 1L))
throw(cond, parent = e)
})
4: stop(res$error$parent$error)
3: <condition-handler>(...)
2: throw(cond, parent = e)
1: stop(cond)
Is there any progress on this @gaborcsardi? I'd be happy to try to put together a PR if you can give some direction on how to go from the renv lockfile with dependencies specified (above) to a functional pak lockfile.
@kevinushey So if pak were to install packages from an renv lockfile, where would those packages go, and how would this work together with renv's libraries?
Would the renv project need to be activated first?
Would pak need to be a dependency in the renv project?
If the renv project does not need to be active, then pak would install the packages into a regular (non-renv) library? That does not seem ideal.
I don't really see how this would work.
@ccasar where could we find the renv.lock that you have used?
So if pak were to install packages from an renv lockfile, where would those packages go, and how would this work together with renv's libraries?
They would get installed into .libPaths()[1], which would normally be set to the renv project library when renv is loaded for a project.
Would the renv project need to be activated first?
Yes, to ensure the library paths are set appropriately.
Would pak need to be a dependency in the renv project?
It shouldn't; at least from the renv side, renv::install() and friends automatically install and load pak when required, so it's sort of an automatically-fulfilled implicit dependency for projects that have opted-in to using pak.
Just to re-iterate, right now, renv::install() uses pak when options(renv.config.pak.enabled = TRUE) is set. When this is set, renv basically forms a call like the following:
pak::pkg_install(c("[email protected]", ...))
That is, it transforms the lockfile record entries into short-form remotes (including their versions) that can be processed by pak.
OK, but that means that there is no pak function to add, and pak does not actually need to read the renv lockfile, and there is nothing to do here, essentially?
I was able to reduce our GitHub Actions build times from ~4.5 hrs to ~1hr in this repo.
I couldn't get renv::restore() with options(renv.config.pak.enabled = TRUE) to work. pak's usage of pkgdepends (the standard usage) leads to complaints of conflicting versions.
I was able to cobble together a solution that uses pkgdepends directly. It successfully installs the majority of packages in the renv.lock though some are not installed (not sure why). To get around this, our Dockerfile first uses pkgdepends (in restore_fast.R) and then uses renv::restore (in restore_renv.R) to install any remaining packages. A couple of other issues I noted:
renv::restorewill re-install packages that are'crossgrade'(same version installed but a current snapshot would produce a lockfile that has different fields/values). I skip these in the above repo.pkgdependswill throw an error if there are packages that have been removed from CRAN (archived versions available e.g. Matrix.utils) with with errorinvalid version specification 'NA'. The workaround for these is to direct install the package usingrenv::installfrom the archived url and then take a snapshot.renvdoes not have a problem resolving these packages so it should be possible to fix this as well.
@alexvpickering One hour still seems pretty long, have you tried to use binary Linux packages from https://packagemanager.posit.co?
Re Matrix.utils, that's not in the lockfile at https://github.com/hms-dbmi-cellenics/pipeline/blob/master/pipeline-runner/renv.lock, am I looking at the wrong file?
Btw. the Bioconductor packages are supposed to install from their git repository? That seems a bit weird, and it is probably slower than installing them from their CRAN-like repository.
@kevinushey Do you have a list of possible values for the Source field in the lockfile, and the extra fields added for each package source? Just to make sure that pak can indeed install all possible package sources.
OK, but that means that there is no pak function to add, and pak does not actually need to read the renv lockfile, and there is nothing to do here, essentially?
Yeah, at least from renv's perspective, now that pak supports versioned remotes, we can use pak to install packages from a lockfile.
Given this, there's probably not an explicit need for pak to support renv lockfiles, since anyone who wants to use an renv lockfile should be using renv::restore(), and so rely on renv to use pak appropriately.
tl;dr: unless you plan to further extend pak here, I think we can close this?
@kevinushey Do you have a list of possible values for the Source field in the lockfile, and the extra fields added for each package source? Just to make sure that pak can indeed install all possible package sources.
The main things you'll see there are "Repository", "Bioconductor", other values already encoded in "RemoteType", "unknown", and "Cellar" (for packages that were found in the renv cellar; https://rstudio.github.io/renv/articles/package-sources.html#the-package-cellar). Although I'm not sure if the "Cellar" source is a good idea...
Sorry, what I meant is, what values can be in the Source field? E.g. in the lockfile above, there are these:
❯ names(table(sapply(df$Packages, "[[", "Source")))
[1] "Bioconductor" "GitHub" "Repository" "URL"
but there are probably others?
@ccasar where could we find the renv.lock that you have used?
Sorry for not providing it earlier @jrosell. I'm trying to reproduce the error now with renv 1.0.2 and pak 0.6.0, but it seems to be solved in the meantime.
but there are probably others?
If the package was installed with remotes or pak, then the RemoteType written by that package would be copied over as the "Source".
It's probably easier to just look at the implementation here: https://github.com/rstudio/renv/blob/main/R/snapshot.R#L712-L760
@alexvpickering One hour still seems pretty long, have you tried to use binary Linux packages from https://packagemanager.posit.co?
Not tried yet. Locally the restore takes ~30 mins (more cores than GA) for ~250 packages. Time is fine for our purposes now as we also employ caching of images so 1hr is worst case scenario and much better than 4.5 hours. Timing would be improved decently if the renv::restore wasn't necessary at all.
Re Matrix.utils, that's not in the lockfile at https://github.com/hms-dbmi-cellenics/pipeline/blob/master/pipeline-runner/renv.lock, am I looking at the wrong file?
Yes sorry my bad, Matrix.utils was a problem case for a related repo. For the above repo, an example is spatstat.core.
Btw. the Bioconductor packages are supposed to install from their git repository? That seems a bit weird, and it is probably slower than installing them from their CRAN-like repository.
Not sure I understand. All packages were just installed and snapshots taken as normal as far as I remember.
Matrix.utils seems like a bug in pak/pkgdepends. spatstat.core does not compile on my machine, so that's probably the reason.
In any case, if pkgdepends cannot install a package that it should be able to install, please open an issue! Thanks!
Not sure I understand. All packages were just installed and snapshots taken as normal as far as I remember.
No worries, it was more of a question for Kevin, but I think I understand the reason now.