pak icon indicating copy to clipboard operation
pak copied to clipboard

Use renv cache if available

Open hadley opened this issue 3 years ago • 8 comments

I'm not entirely sure what this involves, but if you use pak inside a renv-using project, it would be nice if the packages were installed into the renv global cache, and then symlinked using the using renv system. (So that pak::pak() would be equivalent to install.packages() inside renv projects.)

hadley avatar Mar 19 '21 15:03 hadley

Yeah, I am not sure what this involves, either. Maybe some lower level API call in renv that lets us put a package in the cache? @kevinushey?

gaborcsardi avatar Mar 19 '21 17:03 gaborcsardi

If I understand correctly, there's really two things that we want:

  1. pak should have a way of using the global renv cache as a source / shortcut when installing a requested package;
  2. renv should give pak an API for copying packages installed in the current library into the global cache.

renv needs the package DESCRIPTION file in order to figure out the cache key; that's usually straightforward for packages that are current on CRAN, or for packages on GitHub. That becomes more challenging for packages from the CRAN archive though. That said, for the first option I think we want something like:

renv:::renv_cache_find(<description>)

and if that path exists, pak could use that package for installation rather than downloading and installing itself.

For option 2, renv could have a function like:

renv:::renv_cache_synchronize(library, packages)

to copy some set of packages from the requested library to the cache.

kevinushey avatar Mar 19 '21 17:03 kevinushey

I think it can be simpler. E.g.

  • have a function that returns the hash of a package from its description (if that's all you use for hashing), and
  • have a function that returns the location of the cache.

gaborcsardi avatar Mar 19 '21 18:03 gaborcsardi

have a function that returns the hash of a package from its description (if that's all you use for hashing), and

renv:::renv_hash_description(<description path>)

have a function that returns the location of the cache.

renv:::renv_cache_path(<description path>)

I don't think I want to expose these as exported renv functions, but perhaps there's some middle ground (e.g. as R functions; renv.hash.function and renv.cache.path or something?)

Do you have an opinion on what the right contract between pak and renv is here?

kevinushey avatar Mar 19 '21 19:03 kevinushey

Do you have an opinion on what the right contract between pak and renv is here?

IDK, I would have to experiment with this a bit.

Btw. pak modifies DESCRIPTION after installation, so the hash of the installed package will be different. Is that OK?

gaborcsardi avatar Mar 19 '21 19:03 gaborcsardi

To use the renv cache as a source, maybe it is better to query the root of the cache for the current platform and R version? And have some convention about enumerating the packages in the cache.

gaborcsardi avatar Mar 19 '21 19:03 gaborcsardi

Btw. pak modifies DESCRIPTION after installation, so the hash of the installed package will be different. Is that OK?

This is probably okay, depending on what changes pak makes. renv uses a subset of the DESCRIPTION fields when building the hash. The implementation is relatively small and lives here:

https://github.com/rstudio/renv/blob/e6aff9f2dc847a80c8c9b6a666bb3f3825fc7c4d/R/hash.R#L10-L68

To use the renv cache as a source, maybe it is better to query the root of the cache for the current platform and R version? And have some convention about enumerating the packages in the cache.

renv_cache_list() might be useful; e.g.

> renv:::renv_cache_list(packages = "rlang")
[1] "/Users/kevinushey/Library/Application Support/renv/cache/v5/macos/R-4.0/x86_64-apple-darwin17.0/rlang/0.4.10.9000/0624dce817c45fb4539360b206afd1e6/rlang"
[2] "/Users/kevinushey/Library/Application Support/renv/cache/v5/macos/R-4.0/x86_64-apple-darwin17.0/rlang/0.4.10.9000/5e85d0584690ab1a57900ec84ff1f3a6/rlang"
[3] "/Users/kevinushey/Library/Application Support/renv/cache/v5/macos/R-4.0/x86_64-apple-darwin17.0/rlang/0.4.10/599df23c40a4fce9c7b4764f28c37857/rlang"
[4] "/Users/kevinushey/Library/Application Support/renv/cache/v5/macos/R-4.0/x86_64-apple-darwin17.0/rlang/0.4.6/aa263e3ce17b177c49e0daade2ee3cdc/rlang"
[5] "/Users/kevinushey/Library/Application Support/renv/cache/v5/macos/R-4.0/x86_64-apple-darwin17.0/rlang/0.4.7/c06d2a6887f4b414f8e927afd9ee976a/rlang"
[6] "/Users/kevinushey/Library/Application Support/renv/cache/v5/macos/R-4.0/x86_64-apple-darwin17.0/rlang/0.4.8/843a6af51414bce7f8a8e372f11d6cd0/rlang"
[7] "/Users/kevinushey/Library/Application Support/renv/cache/v5/macos/R-4.0/x86_64-apple-darwin17.0/rlang/0.4.9/9d7aba7bed9a79e2403b4777428a2b12/rlang"

kevinushey avatar Mar 19 '21 21:03 kevinushey

Came across this issue. Pak is nice because it is much faster than renv using its own install functions.

When working in an renv-activated project, with renv.config.pak.enabled=TRUE, pak 0.2.1 downloads source code into its own cache and installs the binary into the renv project.

The former is OK because I don't really mind having a source code cache for pak that is separate from the renv source code cache. The latter is a problem because I don't want package binaries replicated in each renv project. I want them installed in the renv binary cache and linked into the project.

I fixed this problem by writing a function that wraps renv::install() followed by

function () { renv::snapshot(prompt = FALSE) lib <- renv::paths$library() lock <- renv:::renv_lockfile_load(".") packages <- lock$Packages invisible(lapply(X = packages, FUN = function(x) renv:::renv_cache_synchronize(record = x, linkable = TRUE))) }

This goes through everything pak just installed, copies it to the renv cache and links it back into the project. Problem solved.

Could be useful for renv::install().

blaserlab avatar Apr 01 '22 17:04 blaserlab