Fails to install when there is a parent library path and do not recognize parent libsPath
Hi, I'm using Ubuntu Noble, and I found pak fails to install when you have packages on /usr/local/lib/R/site-library.
Example:
For root:
sudo bash
R
pak::pkg_install("dplyr")
Now in a normal user:
R
pak::pkg_install("dplyr?source")
→ Will install 1 package.
→ The package (1.21 MB) is cached.
+ dplyr 1.1.4 [bld][cmp]
ℹ No downloads are needed, 1 pkg (1.21 MB) is cached
ℹ Building dplyr 1.1.4
✔ Built dplyr 1.1.4 (13s)
Error:
! error in pak subprocess
Caused by error in `verify_extracted_package(filename, pkg_cache)`:
! /tmp/RtmpNBsw5r/file5b8627a64e601/dplyr_1.1.4_R_x86_64-pc-linux-gnu.tar.gz is not a valid R package, it is an empty archive.
Type .Last.error to see the more details.
> .Last.error
<callr_error/rlib_error_3_0/rlib_error/error>
Error:
! error in pak subprocess
Caused by error in `verify_extracted_package(filename, pkg_cache)`:
! /tmp/RtmpNBsw5r/file5b8627a64e601/dplyr_1.1.4_R_x86_64-pc-linux-gnu.tar.gz is not a valid R package, it is an empty archive.
---
Backtrace:
1. pak::pkg_install("dplyr?source")
2. pak:::remote(function(...) get("pkg_install_do_plan", asNamespace("pak"))(...), …
3. err$throw(res$error)
---
Subprocess backtrace:
1. base::withCallingHandlers(cli_message = function(msg) { …
2. get("pkg_install_do_plan", asNamespace("pak"))(...)
3. proposal$install()
4. pkgdepends::install_package_plan(plan, lib = private$library, num_workers = nw, …
5. base::withCallingHandlers({ …
6. pkgdepends:::handle_events(state, events)
7. pkgdepends:::handle_event(state, i)
8. proc$get_result()
9. processx:::process_get_result(self, private)
10. private$post_process()
11. pkgdepends:::install_extracted_binary(filename, lib_cache, pkg_cache, lib, …
12. pkgdepends:::verify_extracted_package(filename, pkg_cache)
13. base::throw(pkg_error("{.path {filename}} is not a valid R package, it is an empty archive.", …
14. | base::signalCondition(cond)
15. global (function (e) …
This is actually half of the problem, pak should check permissions, and if do not have them, should request to use a local directory.
The second half is if we have a new library pah, we can make a local one just trying to install dplyr normally:
install.packages("dplyr")
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Warning in install.packages("dplyr") :
'lib = "/usr/local/lib/R/site-library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘/home/cit_16/R/x86_64-pc-linux-gnu-library/4.4’
to install packages into? (yes/No/cancel) yes
## CANCEL NOW
Now that the local folder exists, we can see the second issue, is that in case we have multiple paths on libPath, pak will ignore all of the others, install a package in root means will store on /usr/local/lib/R/site-library which also has other packages, but if we try to install again:
pak::pkg_install("dplyr")
→ Will install 16 packages.
→ All 16 packages (6.16 MB) are cached.
+ cli 3.6.3 [bld][cmp]
+ dplyr 1.1.4 [bld][cmp]
+ fansi 1.0.6 [bld][cmp]
+ generics 0.1.3 [bld]
+ glue 1.8.0 [bld][cmp]
+ lifecycle 1.0.4 [bld]
+ magrittr 2.0.3 [bld][cmp]
+ pillar 1.10.1 [bld]
+ pkgconfig 2.0.3 [bld]
+ R6 2.5.1 [bld]
+ rlang 1.1.4 [bld][cmp]
+ tibble 3.2.1 [bld][cmp]
+ tidyselect 1.2.1 [bld][cmp]
+ utf8 1.2.4 [bld][cmp]
+ vctrs 0.6.5 [bld][cmp]
+ withr 3.0.2 [bld]
ℹ No downloads are needed, 16 pkgs (6.16 MB) are cached
ℹ Building cli 3.6.3
ℹ Building fansi 1.0.6
ℹ Building generics 0.1.3
ℹ Building glue 1.8.0
ℹ Building magrittr 2.0.3
ℹ Building pkgconfig 2.0.3
ℹ Building R6 2.5.1
ℹ Building rlang 1.1.4
^Cstalling...
Pak will reinstall everything, even the packages that are already provided from a parent libPath, is ignoring all the installed libraries.
[1] "/home/cit_16/R/x86_64-pc-linux-gnu-library/4.4"
[2] "/usr/local/lib/R/site-library"
[3] "/usr/lib/R/site-library"
[4] "/usr/lib/R/library"
Tested on git Pak.
sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
locale:
[1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C
[3] LC_TIME=es_ES.UTF-8 LC_COLLATE=es_ES.UTF-8
[5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8
[7] LC_PAPER=es_ES.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] processx_3.8.4 compiler_4.4.2 R6_2.5.1 cli_3.6.3 tools_4.4.2
[6] callr_3.7.6 ps_1.8.1 pak_0.8.0
Thx!
Indeed, pak always installs everything into a single library, that is by design. AFAIR there is an issue to consider packages in other libraries as well.
The reason we split this, is that some libraries are used by a lot of ppl, not all of them are tech ones, and even some libraries are not always easy to install, so we install in a shared place, so we can upgrade them one time are keep available to all ppl.
I don't question your reasons. OTOH I am not super eager to support this use case because it is pretty error prone and makes it hard to follow what is installed where. E.g. pak::pkg_install(upgrade = TRUE) installs the latest versions. If it updates a package where should the update go? I guess it should install a new version into the "installation library", even if there is an older version in "another library." But then if "another library" comes first in the library path, then the one that pak just installed will never be used.
Sharing a package library among users also leads to unusual errors, e.g. if you update a package in the shared library, that probably breaks everybody's active R session. If you are on Windows, then you might not be able to update a package in the shared library if a user is using that package, etc
If you decide that it still make sense to create a shared library, that's fine, and we'll support that at some point, but it is not very high priority for me, because I am worried that people don't realize the problems with this setup.
I was trying to point out just use cases, so don't worry.
I agree with your points, I use this with that considerations, who uses this needs take considerations to do not break everything, in my case I upgrade the system and R packages in a specific time, when no one is using it.
At the same time, when we work in institutions for most ppl, "they must use the same package versions", so handle from a centralization place is ideal, only a tech ppl should be able to install and mix versions, obvs non-tech users should not upgrade nor install packages.
Thinking in this, maybe the ideal case is give support for this, but in case there is a higher install path, throw an error, and with a param force the installation if we know "what are we doing".