python-bigquery-pandas icon indicating copy to clipboard operation
python-bigquery-pandas copied to clipboard

Resetting the Context does not work as expected

Open cwdjankoski opened this issue 3 years ago • 0 comments

First of all thank you for this great library and all your work on various tools that help tremendously. 🙇

I would like to report something that I do not fully understand if it is a bug - but at least it is not what you would expect from reading the code, tests and documentation.

Context

I have 2 google service accounts - one is secret_sa that has access to some secret tables that not everyone has access to by default, and the other one is the default_sa that has access to all the standard datasets and tables in our team.
The use-case is following:

  • first i would like to read_gbq from the "secret" project/location via the secret_sa
  • then i would like to upload a pandas df into a default project/location by using default_sa

Reading through the documenation and some of the past discussions on issues - I understand that pandas_gbq sets a global context object which is then reused in order to improve performance. This means in my case after the 1st read_gbq the global context would be set to the secret project - and I would need a way to reset this before the 2nd point.

Expected

What i expected to be able to do from reading e.g. the tests for setting and resetting the global context - is similarly to the reset trick to set pandas_gbq.context to None - and the 2nd call would then use the GOOGLE_APPLICATION_CREDENTIALS in order to pick up the default_sa.
Maybe worth mentioning is that at the first call - i have a function that loads the secret_sa, gets a query and sets/deletes both the pandas_gbq.context and pandas_gbq.Context objects.

However this does not seem to work as expected (at least in my imagination) and the 2nd call still tries to use the secret_sa context/credentials/project.
For now I found a solution - which is to pass reauth = True inside calls to {pandas_gbq | pandas}.to_gbq calls down the line in the script.

Could you please help me figure out what is the correct way to do this ? I would be happy to make this into a nice short example and contribute it somewhere in the docs - if you think that would be helpful.

Many thanks in advance for all your work ! Please let me know if i can provide some more info or examples.

Environment details

  • OS type and version: (i believe it's not that important here)
  • Python version: Python 3.10.6
  • pip version: pip 22.2.1 from /usr/local/lib/python3.10/site-packages/pip (python 3.10)
  • pandas-gbq version: Version: 0.17.8

Code example

import os
import json
import pandas as pd
import pandas_gbq as pdbq

from google.cloud import bigquery
from google.oauth2 import service_account

# authenticate and read stuff from the "secret" project
cred_json_secret = json.loads(os.getenv("SECRET-PROJ-CREDS"))
credentials_secret = service_account.Credentials.from_service_account_info(cred_json_secret)

results_secret = pdbq.read_gbq("SELECT FROM SECRET PROJ", credentials = credentials_secret)

# at this point pandas_gbq has set the global context object
# to the "secret" one
# now i want to unset/delete all this context
pdbq.Context = None
pdbq.context = None
pdbq.project = None

# or also i tried
# del pdbq.Context
# del pdbq.context
# del pdbq.project

# at this point i make a small dataframe and try to upload it
# by letting pandas_gbq figure out the default credentials
# from the GOOGLE_APPLICATION_CREDENTIALS in env.var
# but this does not work
df_upload_with_default_creds = pd.DataFrame(range(3), columns = ["col_A"])

pdbq.to_gbq(
    dataframe = df_upload_with_default_creds,
    destination_table = 'default-dataset.table',
    project_id = "default-project",
    if_exists = "replace"
)

# > GenericGBQException: Reason: 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/the-project/datasets/the-dataset/tables?prettyPrint=false: 
# Access Denied: Dataset the-project:the-dataset: Permission bigquery.tables.create denied on dataset the-project:the-dataset (or it may not exist).

# however this works
pdbq.to_gbq(
    dataframe = df_upload_with_default_creds,
    destination_table = 'default-dataset.table',
    project_id = "default-project",
    if_exists = "replace",
    # ! need this ! 
    reauth = True
)

cwdjankoski avatar Sep 08 '22 08:09 cwdjankoski