pak icon indicating copy to clipboard operation
pak copied to clipboard

Feature request: Add support for `Additional_repositories` field in DESCRIPTION file

Open billdenney opened this issue 3 years ago • 27 comments

To make pak a more complete "alternative to install.packages()", I think that it would be useful if pak used the Additional_repositories field in the DESCRIPTION file: https://cran.r-project.org/doc/manuals/R-exts.html#Package-Dependencies

As I looked in a bit more detail, I found that the DESCRIPTION file lists the needed CRAN-like repository in the Additional_repositories section (https://github.com/nlmixr2/babelmixr2/blob/285f880fcb8902b3320d17878d584590cf41fe6c/DESCRIPTION#L42). Maybe there is a second feature request for pak to look there, too. (I'm happy to open that as a separate issue, if that's helpful.)

Originally posted by @billdenney in https://github.com/r-lib/pak/issues/421#issuecomment-1264694009

billdenney avatar Oct 03 '22 12:10 billdenney

@gaborcsardi: I remember seeing an alternative approach for specifying this so that pak can use CRAN-like repos. Does this ring a bell?

Needed for https://github.com/tidyverse/dplyr/pull/6526.

krlmlr avatar Nov 24 '22 07:11 krlmlr

IDK what you mean. On GHA you can set options("repos") in the setup-r action.

gaborcsardi avatar Nov 24 '22 08:11 gaborcsardi

Can I indicate in DESCRIPTION that pak should get a specific dependency from a cranlike repo (like r-universe)?

krlmlr avatar Nov 24 '22 08:11 krlmlr

No, you cannot currently, apart from using an url:: entry in Remotes.

gaborcsardi avatar Nov 24 '22 09:11 gaborcsardi

That would work for my use case. Will that pick up binary packages from r-universe?

krlmlr avatar Nov 24 '22 10:11 krlmlr

AFAIK you can link to binary packages with url::, if that's your question.

gaborcsardi avatar Nov 24 '22 10:11 gaborcsardi

I want to install bleeding edge duckdb so that the tests run on CI/CD. Can I use url:: three times for three different binary packages? Or will I have to resort to source packages after all?

krlmlr avatar Nov 24 '22 10:11 krlmlr

I want to install bleeding edge duckdb so that the tests run on CI/CD. Can I use url:: three times for three different binary packages?

Maybe, I never tried that.

Or will I have to resort to source packages after all?

Why not add the duckdb repo on the CI instead? Repository configuration does not belong in the package metadata.

gaborcsardi avatar Nov 24 '22 11:11 gaborcsardi

I tried Remotes: duckdb/duckdb/tools/rpkg to no avail:

remotes::install_github("duckdb/duckdb/tools/rpkg")
#> Downloading GitHub repo duckdb/duckdb@HEAD
#> 
#> * checking for file ‘/private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpS8daew/remotes1513e111fe4e9/duckdb-duckdb-dfae126/tools/rpkg/DESCRIPTION’ ... OK
#> * preparing ‘duckdb’:
#> * checking DESCRIPTION meta-information ... OK
#> * cleaning src
#> * checking for LF line-endings in source and make files and shell scripts
#> * checking for empty or unneeded directories
#> * building ‘duckdb_0.6.0.tar.gz’
#> Installing package into '/Users/kirill/Library/R/arm64/4.1/library'
#> (as 'lib' is unspecified)
#> Warning in i.p(...): installation of package '/var/folders/dj/
#> yhk9rkx97wn_ykqtnmk18xvc0000gn/T//RtmpS8daew/file1513e133d9fae/
#> duckdb_0.6.0.tar.gz' had non-zero exit status

Created on 2022-11-24 with reprex v2.0.2

On the console I see:

Downloading GitHub repo duckdb/duckdb@HEAD
✔  checking for file ‘/private/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T/RtmpfnPyxd/remotes14ad52cd2c745/duckdb-duckdb-dfae126/tools/rpkg/DESCRIPTION’ ...
─  preparing ‘duckdb’:
✔  checking DESCRIPTION meta-information ...
─  cleaning src
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘duckdb_0.6.0.tar.gz’
   
Installing package into ‘/Users/kirill/Library/R/arm64/4.1/library’
(as ‘lib’ is unspecified)
Sourcing ../.Rprofile.local
Dev mode: ON
* installing *source* package ‘duckdb’ ...
** using staged installation
** libs
ccache clang++ -std=gnu++11 -std=gnu++11 -I"/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/include" -DNDEBUG   -I/opt/R/arm64/include   -fPIC  -O0 -g -c altrep.cpp -o altrep.o
ccache clang++ -std=gnu++11 -std=gnu++11 -I"/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/include" -DNDEBUG   -I/opt/R/arm64/include   -fPIC  -O0 -g -c connection.cpp -o connection.o
ccache clang++ -std=gnu++11 -std=gnu++11 -I"/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/include" -DNDEBUG   -I/opt/R/arm64/include   -fPIC  -O0 -g -c cpp11.cpp -o cpp11.o
ccache clang++ -std=gnu++11 -std=gnu++11 -I"/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/include" -DNDEBUG   -I/opt/R/arm64/include   -fPIC  -O0 -g -c database.cpp -o database.o
ccache clang++ -std=gnu++11 -std=gnu++11 -I"/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/include" -DNDEBUG   -I/opt/R/arm64/include   -fPIC  -O0 -g -c register.cpp -o register.o
ccache clang++ -std=gnu++11 -std=gnu++11 -I"/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/include" -DNDEBUG   -I/opt/R/arm64/include   -fPIC  -O0 -g -c relational.cpp -o relational.o
ccache clang++ -std=gnu++11 -std=gnu++11 -I"/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/include" -DNDEBUG   -I/opt/R/arm64/include   -fPIC  -O0 -g -c reltoaltrep.cpp -o reltoaltrep.o
ccache clang++ -std=gnu++11 -std=gnu++11 -I"/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/include" -DNDEBUG   -I/opt/R/arm64/include   -fPIC  -O0 -g -c scan.cpp -o scan.o
cpp11.cpp:4:10: fatal error: 'duckdb_types.hpp' file not found
#include "duckdb_types.hpp"
         ^~~~~~~~~~~~~~~~~~
altrep.cpp:1:10: fatal error: 'rapi.hpp' file not found
#include "rapi.hpp"
         ^~~~~~~~~~
1 error generated.
1 error generated.
make: *** [altrep.o] Error 1
make: *** Waiting for unfinished jobs....
reltoaltrep.cpp:1:10: fatal error: 'rapi.hpp' file not found
#include "rapi.hpp"
         ^~~~~~~~~~
1 error generated.
make: *** [cpp11.o] Error 1
make: *** [reltoaltrep.o] Error 1
connection.cpp:1:10: fatal error: 'rapi.hpp' file not found
#include "rapi.hpp"
         ^~~~~~~~~~
1 error generated.
database.cpp:1:10: fatal error: 'rapi.hpp' file not found
#include "rapi.hpp"
         ^~~~~~~~~~
1 error generated.
register.cpp:1:10: fatal error: 'rapi.hpp' file not found
#include "rapi.hpp"
         ^~~~~~~~~~
make: *** [connection.o] Error 1
1 error generated.
make: *** [database.o] Error 1
make: *** [register.o] Error 1
scan.cpp:1:10: fatal error: 'rapi.hpp' file not found
#include "rapi.hpp"
         ^~~~~~~~~~
1 error generated.
make: *** [scan.o] Error 1
relational.cpp:1:10: fatal error: 'cpp11.hpp' file not found
#include "cpp11.hpp"
         ^~~~~~~~~~~
1 error generated.
make: *** [relational.o] Error 1
ERROR: compilation failed for package ‘duckdb’
* removing ‘/Users/kirill/Library/R/arm64/4.1/library/duckdb’
* restoring previous ‘/Users/kirill/Library/R/arm64/4.1/library/duckdb’
Warning message:
In i.p(...) :
  installation of package ‘/var/folders/dj/yhk9rkx97wn_ykqtnmk18xvc0000gn/T//RtmpfnPyxd/file14ad542189a00/duckdb_0.6.0.tar.gz’ had non-zero exit status

krlmlr avatar Nov 24 '22 11:11 krlmlr

I tried Remotes: duckdb/duckdb/tools/rpkg to no avail:

Hmmm, why is that a pak issue?

gaborcsardi avatar Nov 24 '22 11:11 gaborcsardi

Because pak is used at the front end in GitHub Actions. I was looking for functionality existing today to achieve the desired behavior: install dev duckdb as a dplyr dependency. I tried Additional_repositories, thought I was missing another magic option, came here. It could also be solved with a GitHub remote, but that seems to require a fix to {remotes}: r-lib/remotes#576.

krlmlr avatar Nov 24 '22 12:11 krlmlr

FWIW, the linked issue contains a bad idea. But installing from a subdirectory should work even if the installation process accesses files from outside the R package.

krlmlr avatar Nov 24 '22 12:11 krlmlr

Because pak is used at the front end in GitHub Actions.

But you did not use pak at all in your test. remotes::install_github() does not use pak, and GHA does not use remotes, but pak.

Also, the installation fails for a completely different reason, so that's not even a remotes issue. Btw. it also fails if you clone the repo and call R CMD build and R CMD INSTALL.

I tried Additional_repositories

That is for cran-like repositories, and only for optional dependencies. If you indeed have a cran-like repo for dev duckdb, then specify that in options("repos").

It could also be solved with a GitHub remote, but that seems to require a fix to {remotes}: r-lib/remotes#576.

In that issue you tried to install igraph/igraph which does not contain an R package or a DESCRIPTION file, anywhere.

Nevertheless (the dev version of) pak (or any version of remotes) can happily install from a subdirectory, as long as you specify the subdirectory. They assume that you can run R CMD build on the subdirectory and then R CMD INSTALL on the source R package.

This clearly does not hold for duckdb, so there is not much we can do. You'll need to clone the repo, and then manually create an R package using whatever procedure duckdb uses to do that.

gaborcsardi avatar Nov 24 '22 13:11 gaborcsardi

Doesn't pak use {remotes} internally?

I missed that ./configure isn't run at all during R CMD build . . I prepared a version that installs with R CMD build . and R CMD INSTALL *.tar.gz but fails with remotes:

remotes::install_github("krlmlr/duckdb/tools/rpkg@f-cleanup")

The ./cleanup script copies stuff, but requires that the parent directories are present during that process.

Lastly, is there a reason why pak shouldn't be using Additional_repositories ?

I'll work around.

krlmlr avatar Nov 24 '22 17:11 krlmlr

Doesn't pak use {remotes} internally?

It does not.

Lastly, is there a reason why pak shouldn't be using Additional_repositories ?

No reason, it will be probably supported eventually. It is not entirely clear how it should be supported, as it can only be used for suggested packages on CRAN, and it is not part of the CRAN metadata, so we don't know if CRAN(like) packages have Additional_repositories fields.

gaborcsardi avatar Nov 24 '22 17:11 gaborcsardi

{rnaturalearth} is on CRAN: https://cran.r-project.org/web/packages/rnaturalearth/index.html
It uses this Additional_repositories field for suggested package {rnaturalearthhires}.
Currently, @PMassicotte struggles with GitHub Actions as {pak} does not account for this field. See Actions on the GitHub repository: https://github.com/ropensci/rnaturalearth/ So I guess, this could be worth supporting.

I would personally be interested in this feature too, as the use of "r-universe" may change the way I choose my dependencies for CRAN packages.

I created a reprex there, that you can fork for your tests if you want: https://github.com/statnmap/test.add.repos
This package is accepted with pre-CRAN checks using devtools::check_win_devel()
However, even if I add the repo in Rprofile during the GitHub Action, {pak} does not seem to use it:

      - name: Additional Repositories
        run: |
          cat(
          "\noptions(repos = c(thinkropen = 'https://thinkr-open.r-universe.dev', getOption('repos')))",
          file = ".Rprofile", append = TRUE)
          source(".Rprofile")
        shell: Rscript {0}

statnmap avatar Jan 21 '23 09:01 statnmap

Currently, @PMassicotte struggles with GitHub Actions as {pak} does not account for this field.

Have you tried adding the extra repo to options("repos")?

gaborcsardi avatar Jan 21 '23 13:01 gaborcsardi

No I have not. I am not sure where I should do it. In the yaml file?

PMassicotte avatar Jan 21 '23 13:01 PMassicotte

It is the extra-repositories parameter for setup-r.

gaborcsardi avatar Jan 21 '23 14:01 gaborcsardi

Oh yes, it works ! Thanks.

image

Link to the yaml file: https://github.com/statnmap/test.add.repos/blob/main/.github/workflows/R-CMD-check.yaml

statnmap avatar Jan 21 '23 14:01 statnmap

Works perfectly. Thank you, @statnmap and @gaborcsardi. :+1:

PMassicotte avatar Jan 21 '23 15:01 PMassicotte

Although adding extra-repositories to one's workflow is an easy solution, as r-universe is becoming more popular, I think we are going to see Additional_repositories: in DESCRIPTION files much more often. So detecting Additional_repositories: line in the DESCRIPTION is going to be more important.

eeholmes avatar May 18 '23 14:05 eeholmes

It takes about the same amount of time to add extra-repositories as is adding Additional_repositories, and it complicates dependency lookup considerably, so I am not very motivated to add it.

gaborcsardi avatar May 18 '23 15:05 gaborcsardi

Additional_repositories is picked up by CRAN, see, e.g., https://cran.r-project.org/web/packages/DBItest/index.html where I use dblog which isn't on CRAN yet. Are you suggesting two redundant entries?

krlmlr avatar May 18 '23 16:05 krlmlr

Additional_repositories is picked up by CRAN, see, e.g., cran.r-project.org/web/packages/DBItest/index.html where I use dblog which isn't on CRAN yet.

Not really, CRAN does not install anything from Additional_repositories, see the NOTE at https://cran.r-project.org/web/checks/check_results_DBItest.html:

Version: 1.7.3
Check: package dependencies
Result: NOTE
    Package suggested but not available for checking: ‘dblog’

Neither does install.packages() in general.

Are you suggesting two redundant entries?

We already have other entries, e.g. Remotes and the Config/Needs/* entries. The problem with Additional_repositories is that it is not part of the CRAN metadata, so if even if we support it, if you call something like:

pak::pkg_install("DBItest", dependencies = TRUE)

pak will not see the Additional_dependencies field, and will not install dblog. Just like remotes, even though it "supports" Additional_repositories.

❯ remotes::install_cran("DBItest", dependencies = TRUE)
Installing 1 packages: DBItest
Installing package into ‘/Users/gaborcsardi/Library/R/arm64/4.3/library’
(as ‘lib’ is unspecified)
Warning: dependency ‘dblog’ is not available
Warning: dependency ‘dblog’ is not available
also installing the dependency ‘palmerpenguins’

The other issue is that it introduces ambiguity wrt. where packages should be installed from, in case packages are available in multiple repositories. And they will be typically available in multiple repositories, because all packages of a GitHub user/organization are in the same R-universe repo. I guess some users would prefer that repo over CRAN, others might not.

So I would prefer syntax that declares that a certain package should be installed from a certain repository (like we already have in Remotes, etc.), instead of adding a new repository with packages that you might not want to install. And/or syntax in pak to declare repository preferences in general.

gaborcsardi avatar May 18 '23 17:05 gaborcsardi

My understanding of Remotes: is that this will install the package from source. r-universe is a CRAN-like repository in that it has binaries. It is not clear how one would specify a repository with binaries besides CRAN and Bioconductor. But to be honest, I have not tried installing a package with Remotes pointing to a package on r-universe. Perhaps it works just fine. I'll do a test case with a package on r-universe. I can't test this on CRAN since CRAN doesn't allow Remotes.

https://pak.r-lib.org/reference/pak_package_sources.html

eeholmes avatar May 19 '23 23:05 eeholmes

FWIW, devtools::install() appears to look at Additional_repositories to find and install dependencies.

florisvdh avatar Jun 21 '23 14:06 florisvdh