pak
pak copied to clipboard
Feature request: Add support for `Additional_repositories` field in DESCRIPTION file
To make pak a more complete "alternative to install.packages()", I think that it would be useful if pak used the Additional_repositories field in the DESCRIPTION file: https://cran.r-project.org/doc/manuals/R-exts.html#Package-Dependencies
As I looked in a bit more detail, I found that the
DESCRIPTIONfile lists the needed CRAN-like repository in theAdditional_repositoriessection (https://github.com/nlmixr2/babelmixr2/blob/285f880fcb8902b3320d17878d584590cf41fe6c/DESCRIPTION#L42). Maybe there is a second feature request forpakto look there, too. (I'm happy to open that as a separate issue, if that's helpful.)
Originally posted by @billdenney in https://github.com/r-lib/pak/issues/421#issuecomment-1264694009
@gaborcsardi: I remember seeing an alternative approach for specifying this so that pak can use CRAN-like repos. Does this ring a bell?
Needed for https://github.com/tidyverse/dplyr/pull/6526.
IDK what you mean. On GHA you can set options("repos") in the setup-r action.
Can I indicate in DESCRIPTION that pak should get a specific dependency from a cranlike repo (like r-universe)?
No, you cannot currently, apart from using an url:: entry in Remotes.
That would work for my use case. Will that pick up binary packages from r-universe?
AFAIK you can link to binary packages with url::, if that's your question.
I want to install bleeding edge duckdb so that the tests run on CI/CD. Can I use url:: three times for three different binary packages? Or will I have to resort to source packages after all?
I want to install bleeding edge duckdb so that the tests run on CI/CD. Can I use
url::three times for three different binary packages?
Maybe, I never tried that.
Or will I have to resort to source packages after all?
Why not add the duckdb repo on the CI instead? Repository configuration does not belong in the package metadata.
I tried Remotes: duckdb/duckdb/tools/rpkg to no avail:
remotes::install_github("duckdb/duckdb/tools/rpkg")
#> Downloading GitHub repo duckdb/duckdb@HEAD
#>
#> * checking for file ‘/private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpS8daew/remotes1513e111fe4e9/duckdb-duckdb-dfae126/tools/rpkg/DESCRIPTION’ ... OK
#> * preparing ‘duckdb’:
#> * checking DESCRIPTION meta-information ... OK
#> * cleaning src
#> * checking for LF line-endings in source and make files and shell scripts
#> * checking for empty or unneeded directories
#> * building ‘duckdb_0.6.0.tar.gz’
#> Installing package into '/Users/kirill/Library/R/arm64/4.1/library'
#> (as 'lib' is unspecified)
#> Warning in i.p(...): installation of package '/var/folders/dj/
#> yhk9rkx97wn_ykqtnmk18xvc0000gn/T//RtmpS8daew/file1513e133d9fae/
#> duckdb_0.6.0.tar.gz' had non-zero exit status
Created on 2022-11-24 with reprex v2.0.2
On the console I see:
Downloading GitHub repo duckdb/duckdb@HEAD
✔ checking for file ‘/private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpfnPyxd/remotes14ad52cd2c745/duckdb-duckdb-dfae126/tools/rpkg/DESCRIPTION’ ...
─ preparing ‘duckdb’:
✔ checking DESCRIPTION meta-information ...
─ cleaning src
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘duckdb_0.6.0.tar.gz’
Installing package into ‘/Users/kirill/Library/R/arm64/4.1/library’
(as ‘lib’ is unspecified)
Sourcing ../.Rprofile.local
Dev mode: ON
* installing *source* package ‘duckdb’ ...
** using staged installation
** libs
ccache clang++ -std=gnu++11 -std=gnu++11 -I"/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/include" -DNDEBUG -I/opt/R/arm64/include -fPIC -O0 -g -c altrep.cpp -o altrep.o
ccache clang++ -std=gnu++11 -std=gnu++11 -I"/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/include" -DNDEBUG -I/opt/R/arm64/include -fPIC -O0 -g -c connection.cpp -o connection.o
ccache clang++ -std=gnu++11 -std=gnu++11 -I"/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/include" -DNDEBUG -I/opt/R/arm64/include -fPIC -O0 -g -c cpp11.cpp -o cpp11.o
ccache clang++ -std=gnu++11 -std=gnu++11 -I"/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/include" -DNDEBUG -I/opt/R/arm64/include -fPIC -O0 -g -c database.cpp -o database.o
ccache clang++ -std=gnu++11 -std=gnu++11 -I"/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/include" -DNDEBUG -I/opt/R/arm64/include -fPIC -O0 -g -c register.cpp -o register.o
ccache clang++ -std=gnu++11 -std=gnu++11 -I"/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/include" -DNDEBUG -I/opt/R/arm64/include -fPIC -O0 -g -c relational.cpp -o relational.o
ccache clang++ -std=gnu++11 -std=gnu++11 -I"/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/include" -DNDEBUG -I/opt/R/arm64/include -fPIC -O0 -g -c reltoaltrep.cpp -o reltoaltrep.o
ccache clang++ -std=gnu++11 -std=gnu++11 -I"/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/include" -DNDEBUG -I/opt/R/arm64/include -fPIC -O0 -g -c scan.cpp -o scan.o
cpp11.cpp:4:10: fatal error: 'duckdb_types.hpp' file not found
#include "duckdb_types.hpp"
^~~~~~~~~~~~~~~~~~
altrep.cpp:1:10: fatal error: 'rapi.hpp' file not found
#include "rapi.hpp"
^~~~~~~~~~
1 error generated.
1 error generated.
make: *** [altrep.o] Error 1
make: *** Waiting for unfinished jobs....
reltoaltrep.cpp:1:10: fatal error: 'rapi.hpp' file not found
#include "rapi.hpp"
^~~~~~~~~~
1 error generated.
make: *** [cpp11.o] Error 1
make: *** [reltoaltrep.o] Error 1
connection.cpp:1:10: fatal error: 'rapi.hpp' file not found
#include "rapi.hpp"
^~~~~~~~~~
1 error generated.
database.cpp:1:10: fatal error: 'rapi.hpp' file not found
#include "rapi.hpp"
^~~~~~~~~~
1 error generated.
register.cpp:1:10: fatal error: 'rapi.hpp' file not found
#include "rapi.hpp"
^~~~~~~~~~
make: *** [connection.o] Error 1
1 error generated.
make: *** [database.o] Error 1
make: *** [register.o] Error 1
scan.cpp:1:10: fatal error: 'rapi.hpp' file not found
#include "rapi.hpp"
^~~~~~~~~~
1 error generated.
make: *** [scan.o] Error 1
relational.cpp:1:10: fatal error: 'cpp11.hpp' file not found
#include "cpp11.hpp"
^~~~~~~~~~~
1 error generated.
make: *** [relational.o] Error 1
ERROR: compilation failed for package ‘duckdb’
* removing ‘/Users/kirill/Library/R/arm64/4.1/library/duckdb’
* restoring previous ‘/Users/kirill/Library/R/arm64/4.1/library/duckdb’
Warning message:
In i.p(...) :
installation of package ‘/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T//RtmpfnPyxd/file14ad542189a00/duckdb_0.6.0.tar.gz’ had non-zero exit status
I tried Remotes:
duckdb/duckdb/tools/rpkgto no avail:
Hmmm, why is that a pak issue?
Because pak is used at the front end in GitHub Actions. I was looking for functionality existing today to achieve the desired behavior: install dev duckdb as a dplyr dependency. I tried Additional_repositories, thought I was missing another magic option, came here. It could also be solved with a GitHub remote, but that seems to require a fix to {remotes}: r-lib/remotes#576.
FWIW, the linked issue contains a bad idea. But installing from a subdirectory should work even if the installation process accesses files from outside the R package.
Because pak is used at the front end in GitHub Actions.
But you did not use pak at all in your test. remotes::install_github() does not use pak, and GHA does not use remotes, but pak.
Also, the installation fails for a completely different reason, so that's not even a remotes issue. Btw. it also fails if you clone the repo and call R CMD build and R CMD INSTALL.
I tried
Additional_repositories
That is for cran-like repositories, and only for optional dependencies. If you indeed have a cran-like repo for dev duckdb, then specify that in options("repos").
It could also be solved with a GitHub remote, but that seems to require a fix to {remotes}: r-lib/remotes#576.
In that issue you tried to install igraph/igraph which does not contain an R package or a DESCRIPTION file, anywhere.
Nevertheless (the dev version of) pak (or any version of remotes) can happily install from a subdirectory, as long as you specify the subdirectory. They assume that you can run R CMD build on the subdirectory and then R CMD INSTALL on the source R package.
This clearly does not hold for duckdb, so there is not much we can do. You'll need to clone the repo, and then manually create an R package using whatever procedure duckdb uses to do that.
Doesn't pak use {remotes} internally?
I missed that ./configure isn't run at all during R CMD build . . I prepared a version that installs with R CMD build . and R CMD INSTALL *.tar.gz but fails with remotes:
remotes::install_github("krlmlr/duckdb/tools/rpkg@f-cleanup")
The ./cleanup script copies stuff, but requires that the parent directories are present during that process.
Lastly, is there a reason why pak shouldn't be using Additional_repositories ?
I'll work around.
Doesn't pak use {remotes} internally?
It does not.
Lastly, is there a reason why pak shouldn't be using
Additional_repositories?
No reason, it will be probably supported eventually. It is not entirely clear how it should be supported, as it can only be used for suggested packages on CRAN, and it is not part of the CRAN metadata, so we don't know if CRAN(like) packages have Additional_repositories fields.
{rnaturalearth} is on CRAN: https://cran.r-project.org/web/packages/rnaturalearth/index.html
It uses this Additional_repositories field for suggested package {rnaturalearthhires}.
Currently, @PMassicotte struggles with GitHub Actions as {pak} does not account for this field. See Actions on the GitHub repository: https://github.com/ropensci/rnaturalearth/
So I guess, this could be worth supporting.
I would personally be interested in this feature too, as the use of "r-universe" may change the way I choose my dependencies for CRAN packages.
I created a reprex there, that you can fork for your tests if you want: https://github.com/statnmap/test.add.repos
This package is accepted with pre-CRAN checks using devtools::check_win_devel()
However, even if I add the repo in Rprofile during the GitHub Action, {pak} does not seem to use it:
- name: Additional Repositories
run: |
cat(
"\noptions(repos = c(thinkropen = 'https://thinkr-open.r-universe.dev', getOption('repos')))",
file = ".Rprofile", append = TRUE)
source(".Rprofile")
shell: Rscript {0}
Currently, @PMassicotte struggles with GitHub Actions as {pak} does not account for this field.
Have you tried adding the extra repo to options("repos")?
No I have not. I am not sure where I should do it. In the yaml file?
It is the extra-repositories parameter for setup-r.
Oh yes, it works ! Thanks.

Link to the yaml file: https://github.com/statnmap/test.add.repos/blob/main/.github/workflows/R-CMD-check.yaml
Works perfectly. Thank you, @statnmap and @gaborcsardi. :+1:
Although adding extra-repositories to one's workflow is an easy solution, as r-universe is becoming more popular, I think we are going to see Additional_repositories: in DESCRIPTION files much more often. So detecting Additional_repositories: line in the DESCRIPTION is going to be more important.
It takes about the same amount of time to add extra-repositories as is adding Additional_repositories, and it complicates dependency lookup considerably, so I am not very motivated to add it.
Additional_repositories is picked up by CRAN, see, e.g., https://cran.r-project.org/web/packages/DBItest/index.html where I use dblog which isn't on CRAN yet. Are you suggesting two redundant entries?
Additional_repositories is picked up by CRAN, see, e.g., cran.r-project.org/web/packages/DBItest/index.html where I use dblog which isn't on CRAN yet.
Not really, CRAN does not install anything from Additional_repositories, see the NOTE at https://cran.r-project.org/web/checks/check_results_DBItest.html:
Version: 1.7.3
Check: package dependencies
Result: NOTE
Package suggested but not available for checking: ‘dblog’
Neither does install.packages() in general.
Are you suggesting two redundant entries?
We already have other entries, e.g. Remotes and the Config/Needs/* entries. The problem with Additional_repositories is that it is not part of the CRAN metadata, so if even if we support it, if you call something like:
pak::pkg_install("DBItest", dependencies = TRUE)
pak will not see the Additional_dependencies field, and will not install dblog. Just like remotes, even though it "supports" Additional_repositories.
❯ remotes::install_cran("DBItest", dependencies = TRUE)
Installing 1 packages: DBItest
Installing package into ‘/Users/gaborcsardi/Library/R/arm64/4.3/library’
(as ‘lib’ is unspecified)
Warning: dependency ‘dblog’ is not available
Warning: dependency ‘dblog’ is not available
also installing the dependency ‘palmerpenguins’
The other issue is that it introduces ambiguity wrt. where packages should be installed from, in case packages are available in multiple repositories. And they will be typically available in multiple repositories, because all packages of a GitHub user/organization are in the same R-universe repo. I guess some users would prefer that repo over CRAN, others might not.
So I would prefer syntax that declares that a certain package should be installed from a certain repository (like we already have in Remotes, etc.), instead of adding a new repository with packages that you might not want to install. And/or syntax in pak to declare repository preferences in general.
My understanding of Remotes: is that this will install the package from source. r-universe is a CRAN-like repository in that it has binaries. It is not clear how one would specify a repository with binaries besides CRAN and Bioconductor. But to be honest, I have not tried installing a package with Remotes pointing to a package on r-universe. Perhaps it works just fine. I'll do a test case with a package on r-universe. I can't test this on CRAN since CRAN doesn't allow Remotes.
https://pak.r-lib.org/reference/pak_package_sources.html
FWIW, devtools::install() appears to look at Additional_repositories to find and install dependencies.