remotes
remotes copied to clipboard
Adding `git_remote` fallback for `gitlab_remote` use without full API access (Resolves #604)
As described in #604, the current gitlab_remote makes use of API endpoints that are not available to tokens generated for use within gitlab CI (stored in the $CI_JOB_TOKEN env var), throwing errors when these tokens are used.
This PR adds code to first ping the API at a generic endpoint (querying for /version). If that request fails and isTRUE(getOption("remotes.gitlab_git_fallback", TRUE)), a git_remote is returned.
If git2r is available, a credentials object is created from the auth_token. Otherwise, the token is embedded in the url in the form of http://gitlab-ci-token:[email protected]/namespace/project.git.
This allows install_gitlab to be used within CI jobs on non-public deployments of GitLab without the creation and embedding of personal tokens. Pipeline engineers need only to run export GITLAB_PAT=$CI_JOB_TOKEN prior to installing remotes.
Changelog
install_gitlabwill defer to usinginstall_gitwhen authentication doesn't provide adequate API access to download a source archive- will create a
git2r::cred_user_passifgit2ris available- Username is set to
gitlab-ci-token. When providing a PAT, GitLab ignores the username unless one is using aCI_JOB_TOKENtoken within a CI job, in which case it must begitlab-ci-token. Because of this, it covers both scenarios to passgitlab-ci-tokenin both cases. Unfortunately I wasn't able to find any documentation to reference for this behavior, it was only narrowed down through testing.
- Username is set to
- otherwise, will embed authentication in the git url (
http://<username>:<password>@host.com/repo.git)
- will create a
- Because urls may contain access tokens, I wrote some handlers to strip these from messages and from the "Remote*" fields in
DESCRIPTION(Design Feedback Request: is it preferred to keep the full url for updates, or to exclude the password so that it isn't leaked through things likerenv?)git()updated to take an optionaldisplay_argscommand to provide output using censored git url as to not display access tokens in console output. This is used inremote_download.xgit_remoteto display git commands without printing passwords to console.parse_git_url()updated to also extract a username and password, though it might be worth taking on a dependency to handle url parsing since this is regex is getting pretty involvedgit_anon_url()introduced to strip out username and password components from a urlgit_censored_url()introduced to replace the password component with asterisks
- Added a bunch of tests for url parsing, anonymization and censoring
- Added some tests for falling back to a
git_remotewhen the GitLab host API requests fail - Added
git_fallback = getOption("remotes.gitlab_git_fallback", TRUE)parameter toinstall_gitlab - Updated NEWS to describe new behavior
This is an initial pass just to experiment with implementation. Please let me know if this looks like a reasonable approach, and then I can polish this PR with
- [x] Documenting new behaviors
- [x] Add tests
- [x] Get tests working in CI
My main worry with making the GitLab remote more complex is our team doesn't use GitLab, so it is possible this will break in the future without us realizing it.
We would definitely need some tests to avoid this.
Thanks @jimhester - I'm happy to add in tests as much as possible. If you think the implementation looks sound, then I can get to work on tests and updating docs. I was just hesitant to invest more time fleshing out the peripheral bits until getting some impressions on the approach.
@jimhester - this PR is ready whenever you have an opportunity to take a look. The only CI errors are ones that also exist on master. Overall, the design feels a bit clunky, but I'm struggling to come up with anything better.
To trace through the changes, it is easiest to start with install_gitlab's call out to gitlab_to_git_remote, and then look at the uses of $url in install_git.R as the url may contain a url-embedded username and token.
Just to highlight a critical design choice:
Design Feedback Request: is it preferred to keep the full url including username and password (https://dgkf:[email protected]/...) when storing remote_metadata or printing to console?
- the good: this would allow
remotes::update_packagesto update a package which requires authentication - the bad: this might put a user at risk of leaking an access token through things like execution logs if printed to console or an renv lockfile if included in a DESCRIPTION file.
For now, I chose to scrub the username and password from the url before this is added to a DESCRIPTION file to prioritize safety of access tokens over the update experience.
The self-hosted GitLab issues are currently a big pain point at my org, so some help in moving this forward would be greatly appreciated.
Thank you for this PR. This is a must when working on private GitLab instances. I approve its improvements.
I tried it in a CI instance with the following classical use cases I guess. The PR solves the problems encountered with current version of {remotes}. This can be accepted as is.
Use CI_JOB_TOKEN set up with {git2r}
Clone and install_local()
- current {remotes} OK
- PR OK
tempclone <- tempfile(pattern = "conjdown")
dir.create(tempclone)
git2r::clone(url = "https://git.lab.sspcloud.fr/propre-conj/conjdown",
local_path = tempclone,
credentials = git2r::cred_user_pass(username = "gitlab-ci-token",
password = Sys.getenv("CI_JOB_TOKEN"))
)
remotes::install_local(tempclone)
install_git()
- current {remotes} FAIL
- PR OK
options(remotes.git_credentials = git2r::cred_user_pass("gitlab-ci-token", Sys.getenv("CI_JOB_TOKEN")))
remotes::install_git("https://myprivategitlab.com/user/repos")
install_gitlab()
- current {remotes} FAIL
- PR OK with message
remotes::install_gitlab(host = "https://myprivategitlab.com",
repo = "user/repos",
auth_token = Sys.getenv("CI_JOB_TOKEN"))
message:
auth_token does not have scopes 'read-repository' and 'api' for host
'https://myprivategitlab.com" required to install using
gitlab_remote.
Attempting git_remote
install from another package DESCRIPTION file with git2r creds and git::
- current {remotes} FAIL
- PR OK
DESCRIPTION file
Imports:
repos
Remotes:
git::https://myprivategitlab.com/user/repos"
options(remotes.git_credentials = git2r::cred_user_pass("gitlab-ci-token", Sys.getenv("CI_JOB_TOKEN")))
remotes::install_deps(dependencies = TRUE)
Set GITLAB_PAT
install_gitlab()
- current {remotes} FAIL
- PR OK with message
Sys.setenv(GITLAB_PAT = Sys.getenv("CI_JOB_TOKEN"))
remotes::install_gitlab(host = "https://myprivategitlab.com",
repo = "user/repos")
message:
Using GitLab PAT from envvar GITLAB_PAT
auth_token does not have scopes 'read-repository' and 'api' for host
'https://myprivategitlab.com" required to install using
gitlab_remote.
Attempting git_remote
install from another package DESCRIPTION file with GITLAB_PAT and gitlab::
This is a try. I know this is not the aim of this PR, but that could be a future enhancement, maybe.
- current {remotes} FAIL
- PR FAIL.
DESCRIPTION file
Imports:
repos
Remotes:
gitlab::https://myprivategitlab.com/user/repos"
Sys.setenv(GITLAB_PAT = Sys.getenv("CI_JOB_TOKEN"))
remotes::install_deps(dependencies = TRUE)
Error
Error: Unknown remote type: gitlab
Invalid git repo specification: 'https://myprivategitlab.com/user/repos'
Execution halted
Do you think that it could be a good idea to allow gitlab_pat() to also look for CI_JOB_TOKEN environment variable if GITLAB_PAT is empty ? This may solve a lot pain using CI.
Thanks for considering this PR, @jimhester.
Just wanted to highlight this bit in the PR thread for your consideration. I tried my best to dig into how remotes/renv use the Remotes* fields in the DESCRIPTION file, but wasn't totally sure what the preferred solution would be for access tokens in urls and want to make sure it was brought to your attention in case there are any security concerns with how it's handled.
Design Feedback Request: is it preferred to keep the full url including username and password (
https://dgkf:[email protected]/...) when storingremote_metadataor printing to console?
- the good: this would allow
remotes::update_packagesto update a package which requires authentication- the bad: this might put a user at risk of leaking an access token through things like execution logs if printed to console or an renv lockfile if included in a DESCRIPTION file
Currently user-specific url components are stripped to minimize any printing/saving of tokens.
I guess we should strip them, though it would then break updating packages later.
However if you still set the GITLAB_PAT when you run update_packages() would the update work?