doAzureParallel icon indicating copy to clipboard operation
doAzureParallel copied to clipboard

Git PAT token not used when installing packages?

Open p-smirnov opened this issue 5 years ago • 11 comments

I am experiencing the known issue with autoscale and github package installation, where the error message is:

Error: HTTP error 403.
  API rate limit exceeded for 52.*******. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)

  Rate limit remaining: 0/60
  Rate limit reset at: 2019-06-24 22:49:36 UTC

  To increase your GitHub API rate limit
  - Use `usethis::browse_github_pat()` to create a Personal Access Token.
  - Use `usethis::edit_r_environ()` and add the token as `GITHUB_PAT`.
Execution halted

However, I have set the githubAuthenticationToken in the credentials.json file. Is the environmental variable not yet set when the github install occurs with the packages are specified in the cluster.json file?

Possibly relevant: I am using a custom docker image (but I want to install the packages from git as I am iterating on package implementation).

I am not sure how to make a reproducible example, but it occurs when scaling up from 1 to ~400 nodes. Here is my cluster.json in case it helps to reproduce:

  "name": "psmirnov",
  "vmSize": "Standard_D2_v3",
  "maxTasksPerNode": 4,
  "poolSize": {
    "dedicatedNodes": {
      "min": 1,
      "max": 1
    },
    "lowPriorityNodes": {
      "min": 0,
      "max": 5000
    },
    "autoscaleFormula": "QUEUE"
  },
  "containerImage": "bhklab/pharmacogx:v3",
  "rPackages": {
    "cran": ["MASS", "tictoc", "mvtnorm", "abind", "polynom", "memoise", "purrr", "matrixStats"],
    "github": ["bhklab/mCI", "bhklab/fastCI"],
    "bioconductor": []
  },
  "commandLine": [],
  "subnetId": ""
}

p-smirnov avatar Jun 24 '19 21:06 p-smirnov

I experience a similar behavior when not using a docker image. It appears that the github 'Personal Access Token' (PAT) is completely ignored even though it is set up correctly in the credentials file. Therefore I am not able to scale the project up without running into the 'API rate limit exceeded' issue described by p-smirnov above. I confirmed my suspicion that the PAT entry in the credentials file is ignored by setting my github repository to 'private', after which the repo can no longer be installed on the Azure nodes even though the personal access token should allow precisely this. Any help on this issue is appreciated...

minister3000 avatar Jul 28 '19 17:07 minister3000

@p-smirnov @minister3000 I'm taking a look at this

brnleehng avatar Jul 29 '19 18:07 brnleehng

When we migrated to docker containers, it looks like the PAT environment variable is not being passed to the container. Since we use the R in the container image, the container requires the environment variable to exist.

https://github.com/Azure/doAzureParallel/blob/master/R/utility-commands.R#L100-L138

brnleehng avatar Jul 30 '19 16:07 brnleehng

Thanks for looking into this. I should have been more specific: I am not using a custom docker image but 'rocker/tidyverse:lastest'. If I read your answer correctly the PAT variable is not passed to this container either? Is there another way to set the required environment variable, maybe through the cluster.json file?

minister3000 avatar Jul 30 '19 16:07 minister3000

Yes that is correct. The PAT variable is not being passed through container either. I will add a fix for adding the PAT variable to the current environment variables.

I will discuss with others on possibility on environment variables on cluster file.

brnleehng avatar Jul 31 '19 17:07 brnleehng

Thank you for confirming the issue and working on it. I assume private Github repositories can not be installed until this is fixed, and the maximum number of nodes is limited to 40 when using public repositories. (Github allows 60 unauthenticated requests per hour and I reach the limit with 40 nodes for whatever reason). Is there an estimated timeline to get the fix in place?

minister3000 avatar Jul 31 '19 19:07 minister3000

I have a working fix branch that you can use. My plan is to merge it on Monday to do further testing.

devtools::install_github("Azure/doAzureParallel", ref="fix/github-pat-token")

brnleehng avatar Aug 01 '19 00:08 brnleehng

Another issue being seen with this. Fetching private repository is working but package build is returning a node failure error.

─ building ‘demoRcpp_1.0.tar.gz’

g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -I"/mnt/batch/tasks/shared/R/packages/Rcpp/include" -I/usr/local/include -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c RcppExports.cpp -o RcppExports.o g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -I"/mnt/batch/tasks/shared/R/packages/Rcpp/include" -I/usr/local/include -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c rcpp_hello_world.cpp -o rcpp_hello_world.o g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -I"/mnt/batch/tasks/shared/R/packages/Rcpp/include" -I/usr/local/include -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c script.cpp -o script.o g++ -std=gnu++11 -shared -L/usr/local/lib/R/lib -L/usr/local/lib -o demoRcpp.so RcppExports.o rcpp_hello_world.o script.o -L/usr/local/lib/R/lib -lR Error getting parent environment: there is no package called ‘BiocInstaller’

Solfood avatar Aug 02 '19 18:08 Solfood

I can confirm that the fix you provided is working and that the PAT is being passed to, and accepted by GitHub. I no longer hit GitHub's 60 unauthenticated requests threshold and am able to fetch from private repositories and install and run packages that rely on Rcpp. Thank you very much for providing a solution to this problem.

minister3000 avatar Aug 03 '19 04:08 minister3000

@brnleehng Thank you very much for the fix!

p-smirnov avatar Aug 06 '19 14:08 p-smirnov

I am experiencing the known issue with autoscale and github package installation, where the error message is:

Error: HTTP error 403.
  API rate limit exceeded for 52.*******. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)

  Rate limit remaining: 0/60
  Rate limit reset at: 2019-06-24 22:49:36 UTC

  To increase your GitHub API rate limit
  - Use `usethis::browse_github_pat()` to create a Personal Access Token.
  - Use `usethis::edit_r_environ()` and add the token as `GITHUB_PAT`.
Execution halted

However, I have set the githubAuthenticationToken in the credentials.json file. Is the environmental variable not yet set when the github install occurs with the packages are specified in the cluster.json file?

Possibly relevant: I am using a custom docker image (but I want to install the packages from git as I am iterating on package implementation).

I am not sure how to make a reproducible example, but it occurs when scaling up from 1 to ~400 nodes. Here is my cluster.json in case it helps to reproduce:

  "name": "psmirnov",
  "vmSize": "Standard_D2_v3",
  "maxTasksPerNode": 4,
  "poolSize": {
    "dedicatedNodes": {
      "min": 1,
      "max": 1
    },
    "lowPriorityNodes": {
      "min": 0,
      "max": 5000
    },
    "autoscaleFormula": "QUEUE"
  },
  "containerImage": "bhklab/pharmacogx:v3",
  "rPackages": {
    "cran": ["MASS", "tictoc", "mvtnorm", "abind", "polynom", "memoise", "purrr", "matrixStats"],
    "github": ["bhklab/mCI", "bhklab/fastCI"],
    "bioconductor": []
  },
  "commandLine": [],
  "subnetId": ""
}

refer to https://gist.github.com/Z3tt/3dab3535007acf108391649766409421#gistcomment-3746021, simple and awesome !

englianhu avatar May 17 '21 17:05 englianhu