gargle icon indicating copy to clipboard operation
gargle copied to clipboard

bq_auth() fails in GCP AI Platform Notebooks

Open jennybc opened this issue 5 years ago • 13 comments

Manually transferring from bigrquery, since cross-org transfer is not possible.

This is about auth in GCP AI Platform Notebooks, so a combination of "on GCP" and "in a Jupyter Notebook".

https://github.com/r-dbi/bigrquery/issues/340

jennybc avatar May 01 '20 16:05 jennybc

Is this still an issue? googleAuthR/gargle are installed in the R notebooks and credentials_gce(scopes = "https://www.googleapis.com/auth/cloud-platform", service_account = service_account) should work?

I have an example here: https://github.com/MarkEdmondson1234/googleAuthR/blob/master/inst/jupyter/googleAuthR.ipynb

MarkEdmondson1234 avatar May 02 '20 08:05 MarkEdmondson1234

Interesting! I have not looked into this personally. But it would be great to learn that all is well. I’ll leave open for a bit to encourage further comment (or maybe I’ll try it myself). But I consider this semi-closed, then.

jennybc avatar May 02 '20 14:05 jennybc

@ZainRizvi If you can take a look at @MarkEdmondson1234's example and react here, that would be great. I'm submitting a gargle release to CRAN soon.

jennybc avatar May 02 '20 22:05 jennybc

It is possible that the same print debugging described in https://github.com/r-lib/gargle/issues/130#issuecomment-623017017 would reveal useful information here as well and generate some forward momentum.

jennybc avatar May 03 '20 18:05 jennybc

@jennybc and @MarkEdmondson1234, We are trying to use big_auth() to authenticate with an email from the AI Platform R notebook or GCE(RStudio) to BigQuery.

based on the documentation we run: library(bigrquery) bq_auth( email = '[email protected]', path = NULL, scopes = c("https://www.googleapis.com/auth/bigquery", "https://www.googleapis.com/auth/cloud-platform"), cache = gargle::gargle_oauth_cache(), use_oob = gargle::gargle_oob_default(), token = NULL )

sql <- 'SELECT * FROM target_projectid.dataset.table LIMIT 1' tb <- bq_project_query(source_project, sql) bq_table_download(tb, max_results = 10)

But it doesn't work... we have tried many versions of the key-values in the big_auth(), Also tried whatever is suggested in the https://github.com/r-dbi/bigrquery/issues/340

We don't even get the question to pass the token. it runs without any output. If we don't use the big_auth() and run the queries. it automatically fetches the service account(attached to the AI notebook) credentials and runs based on its permission. But we don't want this.. because different developers have access to different datasets outside of this project. So we need to make it work with individuals' credentials.

The access logic works with the python SDK. we don't have any problem with this access design in terms of GCP when we run python but when we follow the big_auth() documentation to do the same with R, we don't get prompted to access a url and get credentials and the big_auth(email) seems doesn't work. We tried it with RStudio as well and it didn't work. Because we have 2 types of R projects, with R notebook and with RStudio.

snowlover173 avatar Oct 08 '20 01:10 snowlover173

I'm pretty sure bq_auth() with email won't work as it's interactive and jupyter notebooks are in non-interactive realm.

But if you auth instead with the GCE auth (gargle::credentials_gce() ) it should work as it will reuse the auth that comes with the notebooks, or you can upload a json account service key and auth via that.

MarkEdmondson1234 avatar Oct 09 '20 04:10 MarkEdmondson1234

@MarkEdmondson1234 this is what we dont want to do. AI notebook and Rstudio at the moment use the service acccount which is attached to the instance to interact with datasets in BQ. But it doesnt work with our use case. Not everyone has access to all datasets. People login to gcp with their email and that emails is being used to give access to datasets. Lets say Aand B are developers and are allowed to work on datasets C and D , repectively. Those datasets will be shared with their emails accordingly. And they should call the BQ with their own credential to be able to run queries.

This pattern works with python because the python sdk has a client that calls. Client has a param, named credentials which are the tempo token which we can get from the generated url.

But we need the same for R. The user needs to call the dataset with their email not the service account of the machine.

snowlover173 avatar Oct 09 '20 08:10 snowlover173

Have you tried use_oob=TRUE ?

MarkEdmondson1234 avatar Oct 09 '20 08:10 MarkEdmondson1234

We tried big_auth with use_oob=TRUE, it didn't do anything on the RStudio running on a GCE, and the R AI notebook. However, when one of our R developers ran it from their RStudio installed on their personal laptop, it worked. It opened a window to select the email and proceeded with the authentication. It generated an auth file in the gargle OAuth folder. After saving that file on the RStudio-GCE, it worked on GCE as well. But this approach didn't work for the notebook. Now the problems: 1- not everyone has an Rstudio installed on their laptop because we give them a work environment within GCP, and I am not sure how convenient it is to ask any contractor to install RStudio and go through this process. 2- It doesn't work on AI notebooks. Is there any workaround?

snowlover173 avatar Oct 12 '20 00:10 snowlover173

The issue identified above by snowlover173 appears to be caused by the following:

  1. The interactive prompt created by httr::oauth2.0_token doesn't respond to input in Jupyter notebooks or the web version of R Studio. If I find the time, I will open a new issue with a reprex in httr repo.
  2. bigrquery::bq_auth(use_oob = True, email = FALSE) doesn't completely override other credential discovery methods called by bq_auth. As a result, when some other credentials exist (I haven't had time to work out exactly which ones), bq_auth(use_oob = True, email = FALSE) uses those credentials, and doesn't initiate the interactive OAuth dance. This is probably an issue with bigrquery, rather than gargle.

For anyone who wants to use the interactive, browser based OAuth dance for user authentication in AI platform notebooks or web-based R Studio, I recommend the following workaround:

Step 1

Complete the OAuth dance using gcloud command line utility. The gcloud interactive prompt works in both Jupyter notebooks and web-based R Studio. (If gcloud is not installed, you can run install.packages('cloudml')to invoke an installer from R).

gcloud oauth login

Step 2

Run the commands below. Make sure you edit it first, so that the email address is the same address you chose in the browser during Step 1.

cd ~/.config/gcloud/legacy_credentials/[email protected]
cp adc.json ../../application_default_credentials.json

This will copy the oauth2.0 token generated by gcloud to a location where gargle expects to find it. As a result, all R libraries that use gargle will be successfully authenticated.

ryandaryl avatar Oct 26 '20 04:10 ryandaryl

Getting similar error - The script works without any issues and without asking for credentials on ai-notebooks cli. But when the same script is passed on as a job submit to ai-platform, it gives out error.

# R-script
library(bigrquery)
library(tidyverse)

# Initial Parameters
strProject <- "cloud-project"
strSQL <- "select * from rssp_test.tbl_iris"

bqPull <- bq_project_query(strProject, strSQL)
tblPull <- bq_table_download(bqPull)

tblTest <- tblPull %>% 
  group_by(species) %>% 
  summarise(
    msl = mean(sepal_length), 
    msw = mean(sepal_width), 
    mpl = mean(petal_length), 
    mpw = mean(petal_width)
  ) %>% 
  select(species, msl, msw, mpl, mpw)

Thanks! Heramb bigrquery_error

herambgadgil avatar Oct 29 '20 18:10 herambgadgil

++ The way our security and architecture is structured in GCP, having a .json file for credentials in individual script is not feasible. The ai-notebooks are linked to a Service Account which is enabled for each instance.

herambgadgil avatar Nov 02 '20 20:11 herambgadgil

Windows Users

I've been struggling with bigrquery authentication for a long time. Recently, I haven't been able to use it at all.

"Step 2" above completely fixed my problem. Now I don't even need to do any interactive authentication!

I set about determining how to do the same procedure in Windows. Follow the same steps, but you will find the gcloud directory in %AppData% (i.e. C:\Users\username\AppData\Roaming\gcloud).

abalter avatar May 01 '22 05:05 abalter

This is a combination of #187 (how do you indicate that you don't want to use the default service account) and how to do an OOB-ish type of flow in GCE-hosted notebooks. Now that we've inlined much more of httr to enable pseudo-OOB, it's possible that the latter is getting more in reach.

jennybc avatar Oct 24 '22 18:10 jennybc

Progress on #140 may also be relevant here.

jennybc avatar Feb 22 '23 04:02 jennybc

I think this might be solved through some combination of recent developments (some of these are only in dev gargle but I will release soon):

  • Pseudo-OOB flow: #214.
  • Improved user flow in Google Colab / Jupyter notebooks: #246. If that doesn't automatically translate into better behaviour in GCP AI Platform Notebooks, I will need someone to help me find an environment variable or similar that can be used to detect we are running in a GCP AI Platform Notebook.
  • Improved ability to explicitly skip the automatic auth with the default service account on GCE:
    cred_funs_add(credentials_gce = NULL) 
    

I'm going to close this because I am hopeful that it's solved. Even if it's not, I think it might be down to advice on usage for this context. In any case, it would be better to start with a fresh issue, if necessary, based on current gargle.

jennybc avatar Apr 13 '23 22:04 jennybc

@jennybc --- I just had to do "step 2" again. I have a VM on gcp. Same problem. Then I remembered that I had found a solution. Searched around until I found my post. I copied the file and now I'm authenticating.

Why is this still an issue?

abalter avatar Dec 01 '23 05:12 abalter

If we're going to restart this discussion, it needs to be a fresh issue, confirming that the problem, whatever it may be, is seen with current gargle, and with a full account of what we're talking about.

jennybc avatar Dec 01 '23 21:12 jennybc

Ok.

abalter avatar Dec 01 '23 21:12 abalter